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Abstract We present an improved model and theory for 
time-causal and time-recursive spatio-temporal receptive fields, 
obtained by a combination of Gaussian receptive fields over 
the spatial domain and first-order integrators or equivalently 
truncated exponential filters coupled in cascade over the tem¬ 
poral domain. 

Compared to previous spatio-temporal scale-space for¬ 
mulations in terms of non-enhancement of local extrema or 
scale invariance, these receptive fields are based on different 
scale-space axiomatics over time by ensuring non-creation 
of new local extrema or zero-crossings with increasing tem¬ 
poral scale. Specifically, extensions are presented about: (i) pa¬ 
rameterizing the intermediate temporal scale levels, (ii) ana¬ 
lysing the resulting temporal dynamics, (iii) transferring the 
theory to a discrete implementation in terms of recursive 
filters over time, (iv) computing scale-normalized spatio- 
temporal derivative expressions for spatio-temporal feature 
detection and (v) computational modelling of receptive fields 
in the lateral geniculate nucleus (LGN) and the primary vi¬ 
sual cortex (VI) in biological vision. 

We show that by distributing the intermediate temporal 
scale levels according to a logarithmic distribution, we ob¬ 
tain a new family of temporal scale-space kernels with better 
temporal characteristics compared to a more traditional ap¬ 
proach of using a uniform distribution of the intermediate 
temporal scale levels. Specifically, the new family of time- 
causal kernels has much faster temporal response properties 
(shorter temporal delays) compared to the kernels obtained 
from a uniform distribution. When increasing the number of 
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temporal scale levels, the temporal scale-space kernels in the 
new family do also converge very rapidly to a limit kernel 
possessing true self-similar scale invariant properties over 
temporal scales. Thereby, the new representation allows for 
true scale invariance over variations in the temporal scale, 
although the underlying temporal scale-space representation 
is based on a discretized temporal scale parameter. 

We show how scale-normalized temporal derivatives can 
be defined for these time-causal scale-space kernels and how 
the composed theory can be used for computing basic types 
of scale-normalized spatio-temporal derivative expressions 
in a computationally efficient manner. 

Keywords Scale space • Receptive field • Scale • Spatial • 
Temporal • Spatio-temporal • Scale-normalized derivative • 
Scale invariance • Differential invariant • Natural image 
transformations • Feature detection • Computer vision ■ 
Computational modelling • Biological vision 

1 Introduction 

Spatio-temporal receptive fields constitute an essential con¬ 
cept for describing neural functions in biological vision (Hubei 
and Wiesel 1131 II32II331 : DeAngelis et al. 111211111 ) and for ex¬ 
pressing computer vision methods on video data (Adelson 
and Bergen JT|; Zelnik-Manor and Irani B99I ; Laptev and 
Lindeberg l43l ; Jhuang et al. (35l : Shabani et al. ||88l ). 

For off-line processing of pre-recorded video, non-causal 
Gaussian or Gabor-based spatio-temporal receptive fields may 
in some cases be sufficient. When operating on video data 
in a real-time setting or when modelling biological vision 
computationally, one does however need to take into explicit 
account the fact that the future cannot be accessed and that 
the underlying spatio-temporal receptive fields must there¬ 
fore be time-causal, i.e., the image operations should only 
require access to image data from the present moment and 
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what has occurred in the past. For computational efficiency 
and for keeping down memory requirements, it is also desir¬ 
able that the computations should be time-recursive , so that 
it is sufficient to keep a limited memory of the past that can 
be recursively updated over time. 

The subject of this article is to present an improved tem¬ 
poral scale-space model for spatio-temporal receptive fields 
based on time-causal temporal scale-space kernels in terms 
of first-order integrators or equivalently truncated exponen¬ 
tial filters coupled in cascade, which can be transferred to 
a discrete implementation in terms of recursive filters over 
discretized time. This temporal scale-space model will then 
be combined with a Gaussian scale-space concept over con¬ 
tinuous image space or a genuinely discrete scale-space con¬ 
cept over discrete image space, resulting in both continuous 
and discrete spatio-temporal scale-space concepts for mod¬ 
elling time-causal and time-recursive spatio-temporal recep¬ 
tive fields over both continuous and discrete spatio-temporal 
domains. The model builds on previous work by (Fleet and 
Langley l20l : Lindeberg and Fagerstrom |66l ; Lindeberg 
15611571 1581) and is here complemented by: (i) a better design 
for the degrees of freedom in the choice of time constants 
for the intermediate temporal scale levels from the original 
signal to any higher temporal scale level in a cascade struc¬ 
ture of temporal scale-space representations over multiple 
temporal scales, (ii) an analysis of the resulting temporal re¬ 
sponse dynamics, (iii) details for discrete implementation in 
a spatio-temporal visual front-end, (iv) details for computing 
spatio-temporal image features in terms of scale-normalized 
spatio-temporal differential expressions at different spatio- 
temporal scales and (v) computational modelling of recep¬ 
tive fields in the lateral geniculate nucleus (LGN) and the 
primary visual cortex (VI) in biological vision. 

In previous use of the temporal scale-space model in 
(Lindeberg and Fagerstrom ll66l ). a uniform distribution of 
the intermediate scale levels has mostly been chosen when 
coupling first-order integrators or equivalently truncated ex¬ 
ponential kernels in cascade. By instead using a logarithmic 
distribution of the intermediate scale levels, we will here 
show that a new family of temporal scale-space kernels can 
be obtained with much better properties in terms of: (i) faster 
temporal response dynamics and (ii) fast convergence to¬ 
wards a limit kernel that possesses true scale-invariant prop¬ 
erties (self-similarity) under variations in the temporal scale 
in the input data. Thereby, the new family of kernels en¬ 
ables: (i) significantly shorter temporal delays (as always 
arise for truly time-causal operations), (ii) much better com¬ 
putational approximation to true temporal scale invariance 
and (iii) computationally much more efficient numerical im¬ 
plementation. Conceptually, our approach is also related to 
the time-causal scale-time model by Koenderink |39l , which 
is here complemented by a truly time-recursive formulation 
of time-causal receptive fields more suitable for real-time 


operations over a compact temporal buffer of what has oc¬ 
curred in the past, including a theoretically well-founded 
and computationally efficient method for discrete implemen¬ 
tation. 

Specifically, the rapid convergence of the new family of 
temporal scale-space kernels to a limit kernel when the num¬ 
ber of intermediate temporal scale levels tends to infinity is 
theoretically very attractive, since it provides a way to define 
truly scale-invariant operations over temporal variations at 
different temporal scales, and to measure the deviation from 
true scale invariance when approximating the limit kernel 
by a finite number of temporal scale levels. Thereby, the 
proposed model allows for truly self-similar temporal op¬ 
erations over temporal scales while using a discretized tem¬ 
poral scale parameter, which is a theoretically new type of 
construction for temporal scale spaces. 

Based on a previously established analogy between scale- 
normalized derivatives for spatial derivative expressions and 
the interpretation of scale normalization of the correspond¬ 
ing Gaussian derivative kernels to constant Lj,-norms over 
scale (Lindeberg ESI), we will show how scale-invariant 
temporal derivative operators can be defined for the pro¬ 
posed new families of temporal scale-space kernels. Then, 
we will apply the resulting theory for computing basic spatio- 
temporal derivative expressions of different types and de¬ 
scribe classes of such spatio-temporal derivative expressions 
that are invariant or covariant to basic types of natural im¬ 
age transformations, including independent rescaling of the 
spatial and temporal coordinates, illumination variations and 
variabilities in exposure control mechanisms. 

In these ways, the proposed theory will present previ¬ 
ously missing components for applying scale-space theory 
to spatio-temporal input data (video) based on truly time- 
causal and time-recursive image operations. 

A conceptual difference between the time-causal tem¬ 
poral scale-space model that is developed in this paper and 
Koenderink’s fully continuous scale-time model |}39S or the 
fully continuous time-causal semi-group derived by Fager¬ 
strom CD and Lindeberg lf56l is that the presented time- 
causal scale-space model will be semi-discrete, with a con¬ 
tinuous time axis and discretized temporal scale parame¬ 
ter. This semi-discrete theory can then be further discretized 
over time (and for spatio-temporal image data also over space) 
into a fully discrete theory for digital implementation. The 
reason why the temporal scale parameter has to be discrete 
in this theory is that according to theoretical results about 
variation-diminishing linear transformations by Schoenberg 
|[8n[82l[83l84l[85l[86l[87l and Karlin H3 that we will build 
upon, there is no continuous parameter semi-group struc¬ 
ture or continuous parameter cascade structure that guaran¬ 
tees non-creation of new structures with increasing tempo¬ 
ral scale in terms of non-creation of new local extrema or 
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new zero-crossings over a continuum of increasing tempo¬ 
ral scales. 

When discretizing the temporal scale parameter into a 
discrete set of temporal scale levels, we do however show 
that there exists such a discrete parameter semi-group struc¬ 
ture in the case of a uniform distribution of the temporal 
scale levels and a discrete parameter cascade structure in 
the case of a logarithmic distribution of the temporal scale 
levels, which both guarantee non-creation of new local ex¬ 
trema or zero-crossings with increasing temporal scale. In 
addition, the presented semi-discrete theory allows for an 
efficient time-recursive formulation for real-time implemen¬ 
tation based on a compact temporal buffer, which Koen- 
derink’s scale-time model lf39l does not, and much better 
temporal dynamics than the time-causal semigroup previ¬ 
ously derived by Fagerstrom fl6l and Lindeberg ll56ll . 

Specifically, we argue that if the goal is to construct a 
vision system that analyses continuous video streams in real 
time, as is the main scope of this work, a restriction of the 
theory to a discrete set of temporal scale levels with the tem¬ 
poral scale levels determined in advance before the image 
data are sampled over time is less of a practical constraint, 
since the vision system anyway has to be based on a finite 
amount of sensors and hardware/wetware for sampling and 
processing the continuous stream of image data. 


1.1 Structure of this article 

To give the contextual overview to this work, section[2]starts 
by presenting a previously established computational model 
for spatio-temporal receptive fields in terms of spatial and 
temporal scale-space kernels, based on which we will re¬ 
place the temporal smoothing step. 

Section [3] starts by reviewing previously theoretical re¬ 
sults for temporal scale-space models based on the assump¬ 
tion of non-creation of new local extrema with increasing 
scale, showing that the canonical temporal operators in such 
a model are first-order integrators or equivalently truncated 
exponential kernels coupled in cascade. Relative to previous 
applications of this idea based on a uniform distribution of 
the intermediate temporal scale levels, we present a concep¬ 
tual extension of this idea based on a logarithmic distribu¬ 
tion of the intermediate temporal scale levels, and show that 
this leads to a new family of kernels that have faster tem¬ 
poral response properties and correspond to more skewed 
distributions with the degree of skewness determined by a 
distribution parameter c. 

Section [4] analyses the temporal characteristics of these 
kernels and shows that they lead to faster temporal charac¬ 
teristics in terms of shorter temporal delays, including how 
the choice of distribution parameter c affects these charac¬ 
teristics. In section [5] we present a more detailed analysis 


of these kernels, with emphasis on the limit case when the 
number of intermediate scale levels K tends to infinity, and 
making constructions that lead to true self-similarity and 
scale invariance over a discrete set of temporal scaling fac¬ 
tors. 

Section [6] shows how these spatial and temporal ker¬ 
nels can be transferred to a discrete implementation while 
preserving scale-space properties also in the discrete imple¬ 
mentation and allowing for efficient computations of spatio- 
temporal derivative approximations. Section [7] develops a 
model for defining scale-normalized derivatives for the pro¬ 
posed temporal scale-space kernels, which also leads to a 
way of measuring how far from the scale-invariant time- 
causal limit kernel a particular temporal scale-space kernel 
is when using a finite number K of temporal scale levels. 

In section [8] we combine these components for comput¬ 
ing spatio-temporal features defined from different types of 
spatio-temporal differential invariants, including an analy¬ 
sis of their invariance or covariance properties under natural 
image transformations, with specific emphasis on indepen¬ 
dent scalings of the spatial and temporal dimensions, illumi¬ 
nation variations and variations in exposure control mech¬ 
anisms. Finally, section [9] concludes with a summary and 
discussion, including a description about relations and dif¬ 
ferences to other temporal scale-space models. 

To simplify the presentation, we have put some of the 
theoretical analysis in the appendix. Appendix [Ajpresents a 
frequency analysis of the proposed time-causal scale-space 
kernels, including a detailed characterization of the limit 
case when the number of temporal scale levels K tends to 
infinity and explicit expressions their moment (cumulant) 
descriptors up to order four. Appendix [B] presents a com¬ 
parison with the temporal kernels in Koenderink’s scale¬ 
time model, including a minor modification of Koenderink’s 
model to make the temporal kernels normalized to unit L \ - 
norm and a mapping between the parameters in his model (a 
temporal offset 6 and a dimensionless amount of smoothing 
a relative to a logarithmic time scale) and the parameters in 
our model (the temporal variance r, a distribution parame¬ 
ter c and the number of temporal scale levels K) including 
graphs of similarities vs. differences between these models. 
Appendix [C] shows that for the temporal scale-space repre¬ 
sentation given by convolution with the scale-invariant time- 
causal limit kernel, the corresponding scale-normalized deriva¬ 
tives become fully scale covariant/invariant for temporal scal¬ 
ing transformations that correspond to exact mappings be¬ 
tween the discrete temporal scale levels. 

This paper is a much further developed version of a con¬ 
ference paper l62l presented at the SSVM 2015, with sub¬ 
stantial additions concerning: 

- the theory that implies that the temporal scales are im¬ 
plied to be discrete (sections|3.1|3.2[), 
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- more detailed modelling of biological receptive fields 
(section |3.6| l, 

- the construction of a truly self-similar and scale-invariant 
time-causal limit kernel (section|5]>, 

- theory for implementation in terms of discrete time-causal 
scale-space kernels (section [6T| , 

- details concerning more rotationally symmetric imple¬ 
mentation over spatial domain (section [6.3| >, 

- definition of scale-normalized temporal derivatives for 
the resulting time-causal scale-space (section[7]), 

- a framework for spatio-temporal feature detection based 
on time-causal and time-recursive spatiotemporal scale 
space, including scale normalization as well as covari¬ 
ance and invariance properties under natural image trans¬ 
formations and experimental results (section[8]), 

- a frequency analysis of the time-causal and time-recursive 
scale-space kernels (appendix [A|>, 

- a comparison between the presented semi-discrete model 
and Koenderink’s fully continuous model, including com¬ 
parisons between the temporal kernels in the two models 
and a mapping between the parameters in our model and 
Koenderink’s model (appendix [B} and 

- a theoretical analysis of the evolution properties over 
scales of temporal derivatives obtained from the time- 
causal limit kernel, including the scaling properties of 
the scale normalization factors under L p -normalization 
and a proof that the resulting scale-normalized deriva¬ 
tives become scale invariant/covariant (appendix |C|>. 

In relation to the SSVM 2015 paper, this paper therefore 
first shows how the presented framework applies to spatio- 
temporal feature detection and computational modelling of 
biological vision, which could not be fully described be¬ 
cause of space limitations, and then presents important the¬ 
oretical extensions in terms of theoretical properties (scale 
invariance) and theoretical analysis as well as other techni¬ 
cal details that could not be included in the conference paper 
because of space limitations. 

2 Spatio-temporal receptive fields 

The theoretical structure that we start from is a general result 
from axiomatic derivations of a spatio-temporal scale-space 
based on assumptions of non-enhancement of local extrema 
and the existence of a continuous temporal scale parameter, 
which states that the spatio-temporal receptive fields should 
be based on spatio-temporal smoothing kernels of the form 
(see overviews in Lindeberg If56l57l ): 

T(xi,X 2 ,t] s, t\ v, E) = g{xi— vit, x 2 —v 2 t; s,E)h(t ; r) 

(1) 

where 


- x = (xi, x 2 ) 7 denotes the image coordinates, 

- t denotes time, 

- s denotes the spatial scale, 

- r denotes the temporal scale, 

- v = {v\,v 2 ) t denotes a local image velocity, 

- E denotes a spatial covariance matrix determining the 
spatial shape of an affine Gaussian kernel g(x\ s, E) = 

_1_g— x T U~ 1 x/2s 

2-KsVdet £ 

- g{x\—v\t,x 2 —v 2 t] s, E) denotes a spatial affine Gaus¬ 
sian kernel that moves with image velocity v = (vi,v 2 ) 
in space-time and 

- h(t; t) is a temporal smoothing kernel over time. 

A biological motivation for this form of separability between 
the smoothing operations over space and time can also be 
obtained from the facts that (i) most receptive fields in the 
retina and the LGN are to a first approximation space-time 
separable and (ii) the receptive fields of simple cells in VI 
can be either space-time separable or inseparable, where the 
simple cells with inseparable receptive fields exhibit recep¬ 
tive fields subregions that are tilted in the space-time domain 
and the tilt is an excellent predictor of the preferred direction 
and speed of motion (DeAngelis et al. mmy 

For simplicity, we shall here restrict the above family of 
affine Gaussian kernels over the spatial domain to rotation- 
ally symmetric Gaussians of different size s, by setting the 
covariance matrix E to a unit matrix. We shall also mainly 
restrict ourselves to space-time separable receptive fields by 
setting the image velocity v to zero. 

A conceptual difference that we shall pursue is by relax¬ 
ing the requirement of a semi-group structure over a contin¬ 
uous temporal scale parameter in the above axiomatic deriva¬ 
tions by a weaker Markov property over a discrete temporal 
scale parameter. We shall also replace the previous axiom 
about non-creation of new image structures with increasing 
scale in terms of non-enhancement of local extrema (which 
requires a continuous scale parameter) by the requirement 
that the temporal smoothing process, when seen as an oper¬ 
ation along a one-dimensional temporal axis only, must not 
increase the number of local extrema or zero-crossings in the 
signal. Then, another family of time-causal scale-space ker¬ 
nels becomes permissible and uniquely determined, in terms 
of first-order integrators or truncated exponential filters cou¬ 
pled in cascade. 

The main topics of this paper are to handle the remaining 
degrees of freedom resulting from this construction about: 
(i) choosing and parameterizing the distribution of temporal 
scale levels, (ii) analysing the resulting temporal dynamics, 
(iii) describing how this model can be transferred to a dis¬ 
crete implementation over discretized time, space or both 
while retaining discrete scale-space properties, (iv) using 
the resulting theory for computing scale-normalized spatio- 
temporal derivative expressions for purposes in computer vi¬ 
sion and (v) computational modelling of biological vision. 
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3 Time-causal temporal scale-space 

When constructing a system for real-time processing of sen¬ 
sor data, a fundamental constraint on the temporal smooth¬ 
ing kernels is that they have to be time-causal. The ad hoc 
solution of using a truncated symmetric filter of finite tem¬ 
poral extent in combination with a temporal delay is not 
appropriate in a time-critical context. Because of computa¬ 
tional and memory efficiency, the computations should fur¬ 
thermore be based on a compact temporal buffer that con¬ 
tains sufficient information for representing the sensor in¬ 
formation at multiple temporal scales and computing fea¬ 
tures therefrom. Corresponding requirements are necessary 
in computational modelling of biological perception. 


3.1 Time-causal scale-space kernels for pure temporal 
domain 

To model the temporal component of the smoothing oper¬ 
ation in equation |T|l, let us initially consider a signal /(f) 
defined over a one-dimensional continuous temporal axis 
i £ R. To define a one-parameter family of temporal scale- 
space representation from this signal, we consider a one- 
parameter family of smoothing kernels h(t; r) where r > 0 
is the temporal scale parameter 

/ OO 

h(u; t) f(t—u) du 

—0 

( 2 ) 

and L(t; 0) = /(f). To formalize the requirement that 
this transformation must not introduce new structures from a 
finer to a coarser temporal scale, let us following Lindeberg 
(45l require that between any pair of temporal scale levels 
r 2 > ti > 0 the number of local extrema at scale t 2 must 
not exceed the number of local extrema at scale tt. Let us 
additionally require the family of temporal smoothing ker¬ 
nels h{u\ t) to obey the following cascade relation 

h(", r 2 ) = t x t 2 ) * h{-\ t x ) (3) 

between any pair of temporal scales (tt , r 2 ) with r 2 > t x for 
some family of transformation kernels ( Ah)(t ; tt i-> t 2 ). 
Note that in contrast to most other axiomatic scale-space 
definitions, we do, however, not impose a strict semi-group 
property on the kernels. The motivation for this is to make it 
possible to take larger scale steps at coarser temporal scales, 
which will give higher flexibility and enable the construction 
of more efficient temporal scale-space representations. 

Following Lindeberg ED, let us further define a scale- 
space kernel as a kernel that guarantees that the number 
of local extrema in the convolved signal can never exceed 


the number of local extrema in the input signal. Equiva¬ 
lently, this condition can be expressed in terms of the num¬ 
ber of zero-crossings in the signal. Following Lindeberg and 
Fagerstrom H66I , let us additionally define a temporal scale- 
space kernel as a kernel that both satisfies the temporal causal¬ 
ity requirement h(t; r) = 0 if t < 0 and guarantees that the 
number of local extrema does not increase under convolu¬ 
tion. If both the raw transformation kernels h(u; r) and the 
cascade kernels ( Ah)(t ; tt i—>• r 2 ) are scale-space kernels, 
we do hence guarantee that the number of local extrema in 
L(t; t 2 ) can never exceed the number of local extrema in 
L(t; Ti). If the kernels h(u; r) and additionally the cas¬ 
cade kernels t x h->■ t 2 ) are temporal scale-space 

kernels, these kernels do hence constitute natural kernels for 
defining a temporal scale-space representation. 

3.2 Classification of scale-space kernels for continuous 
signals 

Interestingly, the classes of scale-space kernels and tempo¬ 
ral scale-space kernels can be completely classified based on 
classical results by Schoenberg and Karlin regarding the the¬ 
ory of variation-diminishing linear transformations. Schoen¬ 
berg studied this topic in a series of papers over about 20 
years (Schoenberg Il^ni82ll83ll84ll85ll8^1l#7l I and Karlin |36] 
then wrote an excellent monograph on the topic of total pos¬ 
itivity. 

Variation diminishing transformations. Summarizing main 
results from this theory in a form relevant to the construction 
of the scale-space concept for one-dimensional continuous 
signals (Lindeberg (48l section 3.5.1]), let S~(f) denote the 
number of sign changes in a function / 

S~(f) =supL"(/(fi),/(f 2 ),...,/(f m )), (4) 

where the supremum is extended over all sets t x < t 2 < 

• • • < tj (tj £ R), J is arbitrary but finite, and V~(v) 
denotes the number of sign changes in a vector v. Then, the 
transformation 

/»oo 

foutiv) = / fin(,V — £) dG(f), (5) 

J £ — — OO 

where G is a distribution function (essentially the primitive 
function of a convolution kernel), is said to be variation- 
diminishing if 

S-(fou,)<S-(f in ) (6) 

holds for all continuous and bounded /,-„. Specifically, the 
transformation ([5]) is variation diminishing if and only if 
G has a bilateral Laplace-Stieltjes transform of the form 
(Schoenberg (851) 

OO CL ' S 

e~ s Z dG(0 = C e^ 2+5s JJ e ‘ (7) 

, 1 “l - 
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for-c < Re(s) < c and some c > 0, where C / 0, 7 > 0, 
S and a, are real, and is convergent. 


Classes of continuous scale-space kernels. Interpreted in the 
temporal domain, this result implies that for continuous sig¬ 
nals there are four primitive types of linear and shift-invariant 
smoothing transformations; convolution with the Gaussian 
kernel, 

M£) = e -7 * a , (8) 


convolution with the truncated exponential functions, 


KO 


e -|A|« £ > 0 , 

0 ? < 0 , 


HO 


el A l£ £ < 0, 
0 / > o 


as well as trivial translation and rescaling. Moreover, it means 
that a shift-invariant linear transformation is variation di¬ 
minishing if and only if it can be decomposed into these 
primitive operations. 


3.3 Temporal scale-space kernels over continuous temporal 
domain 


In the above expressions, the first class of scale-space ker¬ 
nels ([ 8 ]) corresponds to using a non-causal Gaussian scale- 
space concept over time, which may constitute a straightfor¬ 
ward model for analysing pre-recorded temporal data in an 
offline setting where temporal causality is not critical and 
can be disregarded by the possibility of accessing the virtual 
future in relation to any pre-recorded time moment. 

Adding temporal causality as a necessary requirement, 
and with additional normalization of the kernels to unit L \ - 
norm to leave a constant signal unchanged, it follows that 
the following family of truncated exponential kernels 


Pk) 


J_ e -t/ W t > Q 
Mfc 

0 t < 0 


( 10 ) 


constitutes the only class of time-causal scale-space kernels 
over a continuous temporal domain in the sense of guaran¬ 
teeing both temporal causality and non-creation of new lo¬ 
cal extrema (or equivalently zero-crossings) with increasing 
scale (Lindeberg 1331 : Lindeberg and Fagerstrom 1661 ). The 
Laplace transform of such a kernel is given by 


HexpiQi Pk ) 



hexp(t'i Pk) e ^ dt 


1 

1 + p k q 

(ll) 


and coupling K such kernels in cascade leads to a composed 
kernel 


h C omposed(', p) — 1 hexp(‘i Pk) 


having a Laplace transform of the form 

POO 

H C0rri p 0se d(q; l^) = / *£,—^ dt 

J t =—00 

K 


n 

*=1 


1 


1 + Pkq 


(13) 


The composed kernel has temporal mean and variance 

K K 

m K = ^ Pk t k = (14) 


fc =1 


fc =1 


In terms of physical models, repeated convolution with such 
kernels corresponds to coupling a series of first-order inte¬ 
grators with time constants p k in cascade 

d t L(t; T k ) = — {L(t; r k -i) - L(t; r k )) (15) 

Pk 

with L(f; 0) = /(f). In the sense of guaranteeing non¬ 
creation of new local extrema or zero-crossings over time, 
these kernels have a desirable and well-founded smoothing 
property that can be used for defining multi-scale observa¬ 
tions over time. A constraint on this type of temporal scale- 
space representation, however, is that the scale levels are re¬ 
quired to be discrete and that the scale-space representation 
does hence not admit a continuous scale parameter. Com¬ 
putationally, however, the scale-space representation based 
on truncated exponential kernels can be highly efficient and 
admits for direct implementation in terms of hardware (or 
wetware) that emulates first-order integration over time, and 
where the temporal scale levels together also serve as a suf¬ 
ficient time-recursive memory of the past (see figure[ 2 ]). 


■O' 


fjn 


f_out 


Fig. 2 Electric wiring diagram consisting of a set of resistors and ca¬ 
pacitors that emulate a series of first-order integrators coupled in cas¬ 
cade, if we regard the time-varying voltage fi n as representing the 
time varying input signal and the resulting output voltage f ou t as rep¬ 
resenting the time varying output signal at a coarser temporal scale. 
According to the presented theory, the corresponding truncated expo¬ 
nential kernels of time are the only primitive temporal smoothing ker¬ 
nels that guarantee both temporal causality and non-creation of local 
extrema (alternatively zero-crossings) with increasing temporal scale. 
Such first-order temporal integration can be used as a straightforward 
computational model for temporal processing in biological neurons 
(see also Koch GZ3 Chapters 11-12] regarding physical modelling of 
the information transfer in dendrites of neurons). 


( 12 ) 
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h(t; p,K = 7) h t (t] p, K = 7) p,K = 7) 






Fig. 1 Equivalent kernels with temporal variance r = 1 corresponding to the composition of K = 7 truncated exponential kernels in cascade and 
their first- and second-order derivatives, (top row) Equal time constants p. (second row) Logarithmic distribution of the scale levels for c = \/2. 
(third row) Logarithmic distribution for c = 2 3 ' 4 . (bottom row) Logarithmic distribution for c = 2. 


3.4 Distributions of the temporal scale levels 

When implementing this temporal scale-space concept, a set 
of intermediate scale levels r k has to be distributed between 
some minimum and maximum scale levels T m i n = tt and 
Tmax = t~k- Next, we will present three ways of discretizing 
the temporal scale parameter over K temporal scale levels. 

Uniform distribution of the temporal scales. If one chooses 
a uniform distribution of the intermediate temporal scales 


Logarithmic distribution of the temporal scales with free 
minimum scale. More natural is to distribute the temporal 
scale levels according to a geometric series, corresponding 
to a uniform distribution in units of effective temporal scale 
t e f f = logr (Lindeberg (47]). If we have a free choice of 
what minimum temporal scale level T m i n to use, a natural 
way of parameterizing these temporal scale levels is by us¬ 
ing a distribution parameter c > 1 

T k = c 2(k ~ K) T max (1 < k < K) (18) 


tfc — 'fmax ( 16 ) 

then the time constants of all the individual smoothing steps 
are given by 



(17) 


which by equation ( [14] implies that time constants of the 
individual first-order integrators should be given by 

Mi =c 1 ~ K sjr max (19) 

Mfc = y/r k - T k - 1 = c k - K ~Wc 2 - V T ma: x (2 < k < K) 

( 20 ) 


Mfc = 
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SIMPLE, 
Separable 

0 

0 6 0 5 



h xt (x,t\ s,t) 


—h X xt(x,t; s,t ) 





Fig. 4 Computational modelling of simple cells in the primary visual cortex (VI) as reported by DeAngelis et al. fill using idealized spatio- 
temporal receptive fields of the form T(x, /; s, r, v) = d x <* d t p g(x — vt\ s) h(t ; r) according to equatio n (lj l and with the temporal smoothing 
function h(t ; r) modelled as a cascade of first-order integrators/truncated exponential kernels of the form i | I 2 ] l (left column) Separable receptive 
fields corresponding to mixed derivatives of first- or second-order derivatives over space with first-order derivatives over time, (right column) 
Inseparable velocity-adapted receptive fields corresponding to second- or third-order derivatives over space. Parameter values: (a) h x t: rr x = 
0.6 degrees, at = 60 ms. (b) h X xt- &x = 0.6 degrees, at = 80 ms. (c) h xx : a x = 0.7 degrees, at = 50 ms, v = 0.007 degrees/ms. (d) h xxx : 
a x = 0.5 degrees, at = 80 ms, v = 0.004 degrees/ms. (Horizontal axis: Space x in degrees of visual angle. Vertical axis: Time t in ms.) 
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h X xt(x,t; s,t) 


-h xxt t(x,t] s,t) 



Fig. 3 Computational modelling of space-time separable receptive 
field profiles in the lateral geniculate nucleus (LGN) as reported by 
DeAngelis et al. Q2) using idealized spatio-temporal receptive fields 
of the form T(x,t\ s,t) = d x <*d t pg(\ s ) /i(t; r) according to equa¬ 
tion {TJ and with the temporal smoothing function h(t\ r) modelled as 
a cascade of first-order integrators/truncated exponential kernels of the 
form m (left) a “non-lagged cell” modelled using first-order tem¬ 
poral derivatives (right) a “lagged cell” modelled using second-order 
temporal derivatives. Parameter values: (a) h xx t'. = 0.5 degrees, 
at = 40 ms. (b) h xx tt : a x = 0.6 degrees, at = 60 ms. (Horizontal 
dimension: space x. Vertical dimension: time f.) 


using either a uniform distribution or a logarithmic distribu¬ 
tion of the intermediate scale levels. 

In general, these kernels are all highly asymmetric for 
small values of K , whereas the kernels based on a uniform 
distribution of the intermediate temporal scale levels become 
gradually more symmetric around the temporal maximum 
as K increases. The degree of continuity at the origin and 
the smoothness of transition phenomena increase with K 
such that coupling of K > 2 kernels in cascade implies 
a C K ~ 2 -continuity of the temporal scale-space kernel. To 
guarantee at least C 1 -continuity of the temporal derivative 
computation kernel at the origin, the order n of differenti¬ 
ation of a temporal scale-space kernel should therefore not 
exceed K — 2. Specifically, the kernels based on a logarith¬ 
mic distribution of the intermediate scale levels (i) have a 
higher degree of temporal asymmetry which increases with 
the distribution parameter c and (ii) allow for faster tempo¬ 
ral dynamics compared to the kernels based on a uniform 
distribution. 

In the case of a logarithmic distribution of the interme¬ 
diate temporal scale levels, the choice of the distribution pa¬ 
rameter c leads to a trade-off issue in that smaller values of 
c allow for a denser sampling of the temporal scale levels, 
whereas larger values of c lead to faster temporal dynamics 
and a more skewed shape of the temporal receptive fields 
with larger deviations from the shape of Gaussian deriva¬ 
tives of the same order. 


3.6 Computational modelling of biological receptive fields 


Logarithmic distribution of the temporal scales with given 
minimum scale. If the temporal signal is on the other hand 
given at some minimum temporal scale r m j„, we can instead 


determine c = 


( 5 =) 


2(K-1) 


in (18 i such that r-| = t„ 


and add K — 1 temporal scales wim/rfc according to (20 1 . 


Logarithmic memory of the past. When using a logarithmic 
distribution of the temporal scale levels according to either 
of the last two methods, the different levels in the tempo¬ 
ral scale-space representation at increasing temporal scales 
will serve as a logarithmic memory of the past, with qualita¬ 
tive similarity to the mapping of the past onto a logarithmic 
time axis in the scale-time model by Koenderink |39| . Such 
a logarithmic memory of the past can also be extended to 
later stages in the visual hierarchy. 


3.5 Temporal receptive fields 

Figure [I] shows graphs of such temporal scale-space kernels 
that correspond to the same value of the composed variance. 


Receptive fields in the LGN. Regarding visual receptive fields 
in the lateral geniculate nucleus (LGN), DeAngelis et al. 
ll 1211 111 report that most neurons (i) have approximately cir¬ 
cular center-surround organization in the spatial domain and 
that (ii) most of the receptive fields are separable in space- 
time. There are two main classes of temporal responses for 
such cells: (i) a “non-lagged cell” is defined as a cell for 
which the first temporal lobe is the largest one (figure[3j left)), 
whereas (ii) a “lagged cell” is defined as a cell for which the 
second lobe dominates (figure [3jright)). 

Such temporal response properties are typical for first- 
and second-order temporal derivatives of a time-causal tem¬ 
poral scale-space representation. For the first-order tempo¬ 
ral derivative of a time-causal temporal scale-space kernel, 
the first peak is strongest, whereas the second peak may be 
the most dominant one for second-order temporal deriva¬ 
tives. The spatial response, on the other hand, shows a high 
similarity to a Laplacian of a Gaussian, leading to an ideal¬ 
ized receptive field model of the form (Lindeberg |57l equa¬ 
tion (108)]) 

h LG N(x,y,t; s,t) = ±(d xx +d yy ) g(x, y; s)d t nh(t; r). 
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( 21 ) 

Figure[3]shows results of modelling separable receptive fields 
in the LGN in this way, using a cascade of first-order inte¬ 
grators/truncated exponential kernels of the form ( [T 2 | for 
modelling the temporal smoothing function h(t; r). 

Receptive fields in VI. Concerning the neurons in the pri¬ 
mary visual cortex (VI), DeAngelis et al. mm describe 
that their receptive fields are generally different from the re¬ 
ceptive fields in the LGN in the sense that they are (i) ori¬ 
ented in the spatial domain and (ii) sensitive to specific stim¬ 
ulus velocities. Cells (iii) for which there are precisely local¬ 
ized “on” and “off” subregions with (iv) spatial summation 
within each subregion, (v) spatial antagonism between on- 
and off-subregions and (vi) whose visual responses to sta¬ 
tionary or moving spots can be predicted from the spatial 
subregions are referred to as simple cells (Hubei and Wiesel 
I3TT1321 L In Lindeberg ED, an idealized model of such re¬ 
ceptive fields was proposed of the form 

^simple— cell (*t>l} X 2 ? U S, T, tt, 27) — 

(cos tp d Xl + sill tp d X2 ) mi (sin tp d Xl - cos ip d X2 ) m2 
(ui d Xl + v 2 d X2 + d t ) n 

g(xi — V\t, x 2 — v 2 t] s E) h{t\ r ) ( 22 ) 

where 

- d v = cos ipd Xl + sirup d X2 and d± v = sin (^<9^ — 

cos tp d X2 denote spatial directional derivative operators 
in two orthogonal directions ip and _L tp, 

- mi > 0 and m 2 > 0 denote the orders of differen¬ 
tiation in the two orthogonal directions in the spatial 
domain with the overall spatial order of differentiation 
m = m\ + m 2 , 

- vi d Xl +v 2 d X2 +dt denotes a velocity-adapted temporal 
derivative operator 

and the meanings of the other symbols are similar as ex¬ 
plained in connection with equation 0 - 

Figure[4]shows the result of modelling the spatio-temporal 
receptive fields of simple cells in V 1 in this way, using the 
general idealized model of spatio-temporal receptive fields 
in equation (|TJ in combination with a temporal smoothing 
kernel obtained by coupling a set of first-order integrators 
or truncated exponential kernels in cascade. As can be seen 
from the figures, the proposed idealized receptive field mod¬ 
els do well reproduce the qualitative shape of the neurophys- 
iologically recorded biological receptive fields. 

These results complement the general theoretical model 
for visual receptive fields in Lindeberg ED by (i) temporal 
kernels that have better temporal dynamics than the time- 
causal semi-group derived in Lindeberg |[56l by decreasing 


faster with time (decreasing exponentially instead of poly- 
nomially) and with (ii) explicit modelling results and a the¬ 
ory (developed in more detail in following sections J] for 
choosing and parameterizing the intermediate discrete tem¬ 
poral scale levels in the time-causal model. 

With regard to a possible biological implementation of 
this theory, the evolution properties of the presented scale- 
space models over scale and time are governed by diffu¬ 
sion and difference equations (see equations <|23]i-(|24) in the 
next section), which can be implemented by operations over 
neighbourhoods in combination with first-order integration 
over time. Hence, the computations can naturally be imple¬ 
mented in terms of connections between different cells. Dif¬ 
fusion equations are also used in mean field theory for ap¬ 
proximating the computations that are performed by popu¬ 
lations of neurons (Omurtag et al. (761; Mattia and Guidice 
l73l : Faugeras et al. lfl 8 l ). 

By combination of the theoretical properties of these 
kernels regarding scale-space properties between receptive 
field responses at different spatial and temporal scales as 
well as their covariance properties under natural image trans¬ 
formations (described in more detail in the next section), the 
proposed theory can be seen as a both theoretically well- 
founded and biologically plausible model for time-causal 
and time-recursive spatio-temporal receptive fields. 


3.7 Theoretical properties of time-causal spatio-temporal 
scale-space 

Under evolution of time and with increasing spatial scale, 
the corresponding time-causal spatio-temporal scale-space 
representation generated by convolution with kernels of the 
form { 1 } with specifically the temporal smoothing kernel 
h(t; t) defined as a set of truncated exponential kernels/first- 
order integrators in cascade © obeys the following system 
of differential/difference equations 

d,L=^Vl{SV x L), (23) 

d t L = -v t (V x L) - —5 t L , (24) 

Rk 

1 The theoretical results following in section [ 5 ] state that temporal 
scale covariance becomes possible using a logarithmic distribution of 
the temporal scale levels. Section [4] states that the temporal response 
properties are faster for a logarithmic distribution of the intermediate 
temporal scale levels compared to a uniform distribution. If one has 
requirements about how fine the temporal scale sampling needs to be 
or maximally allowed temporal delays, then table [2] in section [4] pro¬ 
vides constraints on permissable values of the distribution parameter 
c. Finally, the quantitative criterion in section |7.4| (see table [5j states 
how many intermediate temporal scale levels are needed to approxi¬ 
mate temporal scale invariance up to a given accuracy. 
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with the difference operator S T over temporal scale 

(■ 5 r L)(x,t ; s,r k ; S,v) = 

L(x,t ; s,T k ; S, v) - L(x, t ; s,r k - 1 ; S,v). (25) 

Theoretically, the resulting spatio-temporal scale-space rep¬ 
resentation obeys similar scale-space properties over the spa¬ 
tial domain as the two other spatio-temporal scale-space mod¬ 
els derived in Lindeberg H561l571l58l regarding (i) linearity 
over the spatial domain, (ii) shift invariance over space, (iii) semi 
group and cascade properties over spatial scales, (iv) self¬ 
similarity and scale covariance over spatial scales so that 
for any uniform scaling transformation (x', t') T = (Sx, t) T 
the spatio-temporal scale-space representations are related 
by L'(x',t'-, s',T k ; S,v') = L{x,t\ s,T k ] S,v) with 
s' = S 2 s and v' = Sv and (v) non-enhancement of local 
extrema with increasing spatial scale. 

If the family of receptive fields in equation (jTjl is defined 
over the full group of positive definite spatial covariance ma¬ 
trices S in the spatial affine Gaussian scale-space I i48ll69l 
[56), then the receptive field family also obeys (vi) closedness 
and covariance under time-independent affine transforma¬ 
tions of the spatial image domain, (x',t') T = (Ax,t) T im¬ 
plying L’(x', t'; s,T k ; S',v') = L(x,t; s,r k ; r,u)with 
S' = AS A 1 and v' = Av, and as resulting from e.g. local 
linearizations of the perspective mapping (with locality de¬ 
fined as over the support region of the receptive field). When 
using rotationally symmetric Gaussian kernels for smooth¬ 
ing, the corresponding spatio-temporal scale-space represen¬ 
tation does instead obey (vii) rotational invariance. 

Over the temporal domain , convolution with these ker¬ 
nels obeys (viii) linearity over the temporal domain, (ix) shift 
invariance over the temporal domain, (x) temporal causal¬ 
ity, (xi) cascade property over temporal scales, (xii) non¬ 
creation of local extrema for any purely temporal signal. 

If using a uniform distribution of the intermediate temporal 
scale levels, the spatio-temporal scale-space representation 
obeys a (xiii) semi-group property 0 over discrete temporal 
scales. Due to the finite number of discrete temporal scale 
levels, the corresponding spatio-temporal scale-space rep¬ 
resentation cannot however for general values of the time 
constants /i k obey full self-similarity and scale covariance 
over temporal scales. Using a logarithmic distribution of the 

2 When using a uniform distribution of the intermediate temporal 
scale levels, with temporal scale increment At between adjacent tem¬ 
poral scale levels, we can equivalently parameterize the temporal scale 
parameter by its temporal scale index k. Then, any temporal scale level 
k corresponding to the composed temporal variance r = k At is given 
by L(f; k) = \A=A £ tp{-\ At)) */(-))(f) with the composed con¬ 
volution kernel h(-- k) = *^ =1 h e xp(-', At) obeying the discrete 
semi-group property /i(-; fci) * h(-; ^ 2 ) = /i(-; k\ + k^). Param¬ 
eterized over the temporal scale parameter r, the semi-group property 
does instead read h{-\ fci At) * /i(-; fc 2 At) = /?.(-; (fci + k 2 )Ar), 
where At = /r 2 and /r is the time-constant of the first-order integrator. 


temporal scale levels and an additional limit case construc¬ 
tion to the infinity, we will however show in section[5]that it 
is possible to achieve (xiv) self-similarity ( |4T) and scale co- 
variance (49 1 over the discrete set of temporal scaling trans¬ 
formations (x',t') T = (x,cH) T that precisely corresponds 
to mappings between any pair of discretized temporal scale 
levels as implied by the logarithmically distributed temporal 
scale parameter with distribution parameter c. 

Over the composed spatio-temporal domain, these ker¬ 
nels obey (xv) positivity and (xvi) unit normalization in L \ - 
norm. The spatio-temporal scale-space representation also 
obeys (xvii) closedness and covariance under local Galilean 
transformations in space-time, in the sense that for any Gali¬ 
lean transformation (x 1 , t') T = (x — ut, t) T with two video 
sequences related by f'(x',t') = f(x,t) their correspond¬ 
ing spatio-temporal scale-space representations will be equal 
for corresponding parameter values L' (V, t'\ s,r k ; S,v') = 
L(x,t; s,r k ; S, v) with v' = v — u. 

If additionally the velocity value v and/or the spatial co- 
variance matrix S can be adapted to the local image struc¬ 
tures in terms of Galilean and/or affine invariant fixed point 
properties B56ll64ll48ll69l . then the spatio-temporal receptive 
field responses can additionally be made (xviii) Galilean in¬ 
variant and/or (xix) affine invariant. 


4 Temporal dynamics of the time-causal kernels 

For the time-causal filters obtained by coupling truncated 
exponential kernels in cascade, there will be an inevitable 
temporal delay depending on the time constants j.i k of the 
individual filters. A straightforward way of estimating this 
delay is by using the additive property of mean values under 
convolution uik = d’k according to (14 1 . In the spe¬ 

cial case when all the time constants are equal pt k = \Jt/ K, 
this measure is given by 


= s/Kt 


(26) 


showing that the temporal delay increases if the temporal 
smoothing operation is divided into a larger number of smaller 
individual smoothing steps. 

In the special case when the intermediate temporal scale 
levels are instead distributed logarithmically according to 
& with the individual time constants given by © and 
(( 20 ), this measure for the temporal delay is given by 

c -A (c 2 — (Vc 2 — 1 + l) c + v/c 2 — 1 c K ) 
miog = -:-V t 


c — 1 


with the limit value 


ttMog—limit — lim TTllog — 
K—t 00 


C + 1 

C— 1 


(27) 


(28) 
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when the number of filters tends to infinity. 

By comparing equations ( |26[ and ( [27] ), we can specifi¬ 
cally note that with increasing number of intermediate tem¬ 
poral scale levels a logarithmic distribution of the interme¬ 
diate scales implies shorter temporal delays than a uniform 
distribution of the intermediate scales. 

Table |T| shows numerical values of these measures for 
different values of I\ and c. As can be seen, the logarith¬ 
mic distribution of the intermediate scales allows for signifi¬ 
cantly faster temporal dynamics than a uniform distribution. 


tion of the intermediate temporal scales, for which a com¬ 
pact closed form expression is available for the composed 
kernel and corresponding to the probability density function 
of the Gamma distribution 

t K ~i 

hcomposed(ti //, A) T(A~) * 

The temporal derivatives of these kernels relate to Laguerre 
functions (Laguerre polynomials (t ) multiplied by a trun¬ 
cated exponential kernel) according to Rodrigues formula: 



Temporal mean values 

m of time-causal kernels 

K 

TYluni 

mi 0 g (c = V2) 

mi oq (c = 2 3 / 4 ) 

mi og (c = 2) 

2 

1.414 

1.414 

1.399 

1.366 

3 

1.732 

1.707 

1.636 

1.549 

4 

2.000 

1.914 

1.777 

1.641 

5 

2.236 

2.061 

1.860 

1.686 

6 

2.449 

2.164 

1.910 

1.709 

7 

2.646 

2.237 

1.940 

1.721 

8 

2.828 

2.289 

1.957 

1.726 

9 

3.000 

2.326 

1.968 

1.729 

10 

3.162 

2.352 

1.974 

1.730 

11 

3.317 

2.370 

1.978 

1.731 

12 

3.464 

2.383 

1.980 

1.732 

Table 1 Numerical values of the temporal delay in terms of the tempo- 

ral mean m = 

JL _. p,h in units of a = yfr for time-causal kernels 

obtained by coupling K truncated exponential kernels in cascade in 
the cases of a uniform distribution of the intermediate temporal scale 

levels 

Tfe = kr/K or a logarithmic distribution r*. = 

C 2(k-K) r 

Temporal delays t max from the maxima of time-causal kernels 

K 

tuni 

tlog L = V2) 

t loq (c = 2 3/4 ) 

tlog (c = 2) 

2 

0.707 

0.707 

0.688 

0.640 

3 

1.154 

1.122 

1.027 

0.909 

4 

1.500 

1.385 

1.199 

1.014 

5 

1.789 

1.556 

1.289 

1.060 

6 

2.041 

1.669 

1.340 

1.083 

7 

2.268 

1.745 

1.370 

1.095 

8 

2.475 

1.797 

1.388 

1.100 

9 

2.667 

1.834 

1.398 

1.103 

10 

2.846 

1.860 

1.404 

1.104 

11 

3.015 

1.879 

1.408 

1.105 

12 

3.175 

1.892 

1.410 

1.106 

Table 2 Numerical values for the temporal delay of the local maxi¬ 
mum in units of a = y/r for time-causal kernels obtained by coupling 


K truncated exponential kernels in cascade in the cases of a uniform 
distribution of the intermediate temporal scale levels -r*. = kr/K or a 
logarithmic distribution Tk = c 2 ( fe -rO T with c > 1. 


(30) 

n\ 

Let us differentiate the temporal smoothing kernel 


(hcompos ed(f? — 


t / \ K-\-l 

'it ((*-!)„-*)(!) 


t 3 r{K) 


(31) 


and solve for the position of the local maximum 


t 


max,uni 




{K - 1) 
\fK 


V t - 


(32) 


Table [2] shows numerical values for the position of the lo¬ 
cal maximum for both types of time-causal kernels. As can 
be seen from the data, the temporal response properties are 
significantly faster for a logarithmic distribution of the inter¬ 
mediate scale levels compared to a uniform distribution and 
the difference increases rapidly with K. These temporal de¬ 
lay estimates are also significantly shorter than the temporal 
mean values, in particular for the logarithmic distribution. 

If we consider a temporal event that occurs as a step 
function over time ( e.g. a new object appearing in the field of 
view) and if the time of this event is estimated from the local 
maximum over time in the first-order temporal derivative re¬ 
sponse, then the temporal variation in the response over time 
will be given by the shape of the temporal smoothing ker¬ 
nel. The local maximum over time will occur at a time delay 
equal to the time at which the temporal kernel has its max¬ 
imum over time. Thus, the position of the maximum over 
time of the temporal smoothing kernel is highly relevant for 
quantifying the temporal response dynamics. 


Additional temporal characteristics. Because of the asym¬ 
metric tails of the time-causal temporal smoothing kernels, 
temporal delay estimation by the mean value may however 
lead to substantial overestimates compared to e.g. the posi¬ 
tion of the local maximum. To provide more precise charac¬ 
teristics, let us first consider the case of a uniform distribu- 


5 The scale-invariant time-causal limit kernel 

In this section, we will show that in the case of a logarith¬ 
mic distribution of the intermediate temporal scale levels it 
is possible to extend the previous temporal scale-space con¬ 
cept into a limit case that permits for covariance under tem¬ 
poral scaling transformations, corresponding to closedness 
of the temporal scale-space representation to a compression 
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or stretching of the temporal scale axis by any integer power 
of the distribution parameter c. 

Concerning the need for temporal scale invariance of a 
temporal scale-space representation, let us first note that one 
could possibly first argue that the need for temporal scale in¬ 
variance in a temporal scale-space representation is different 
from the need for spatial scale invariance in a spatial scale- 
space representation. Spatial scaling transformations always 
occur because of perspective scaling effects caused by vari¬ 
ations in the distances between objects in the world and the 
observer and do therefore always need to be handled by a vi¬ 
sion system, whereas the temporal scale remains unaffected 
by the perspective mapping from the scene to the image. 

Temporal scaling transformations are, however, never¬ 
theless important because of physical phenomena or spatio- 
temporal events occurring faster or slower. This is analogous 
to another source of scale variability over the spatial domain, 
caused by objects in the world having different physical size. 
To handle such scale variabilities over the temporal domain, 
it is therefore desirable to develop temporal scale-space con¬ 
cepts that allow for temporal scale invariance. 


Fourier transform of temporal scale-space kernel. When us¬ 
ing a logarithmic distribution of the intermediate scale levels 
<01, the time constants of the individual first-order integra¬ 
tors are given by ( p~9| ) and ( pO| . Thus, the explicit expression 
for the Fourier transform obtained by setting q = iut in the 


expression (111 is of the form 


T, C, A) 

1 K 

1 + i c 1 ~ K y/rto i 


\[t w -1 + i c k K 1 v/c 2 — 1 \/toj 


-• (33) 


Characterization in terms of temporal moments. Although 
the explicit expression for the composed time-causal ker¬ 
nel may be somewhat cumbersome to handle for any finite 
number of K , in appendix A. 1 we show how one based on 
a Taylor expansion of the Fourier transform can derive com¬ 
pact closed-form moment or cumulant descriptors of these 
time-causal scale-space kernels. Specifically, the limit val¬ 
ues of the first-order moment M 4 and the higher-order cen¬ 
tral moments up to order four when the number of temporal 
scale levels K tends to infinity are given by 


lim Mi = 

\l C+1 r 1/2 

(34) 

K^oo 

V c — 1 


lim M 2 = 
K—f oo 

r 

(35) 

lim M 3 = 

2 (c + 1 )\/c 2 — 1 r a/2 

(36) 

(c 2 +c + 1 ) 

K—foo 

lim M 4 = 

3 (3c 2 - 1) t 2 

(37) 

K—foo 

c 2 + 1 



and give a coarse characterization of the limit behaviour 
of these kernels essentially corresponding to the terms in a 
Taylor expansion of the Fourier transform up to order four. 
Following a similar methodology, explicit expressions for 
higher-order moment descriptors can also be derived in an 
analogous fashion, from the Taylor coefficients of higher or¬ 
der, if needed for special purposes. 

In figure[9]in appendix |A. 1 | we show graphs of the corre¬ 
sponding skewness and kurtosis measures as function of the 
distribution parameter c, showing that both these measures 
increase with the distribution parameter c. In figure[T 2 ]in ap¬ 
pendix [B] we provide a comparison between the behaviour 
of this limit kernel and the temporal kernel in Koenderink’s 
scale-time model showing that although the temporal ker¬ 
nels in these two models to a first approximation share qual¬ 
itatively coarsely similar properties in terms of their overall 
shape (see figure[TT]in appendix [B|, the temporal kernels in 
these two models differ significantly in terms of their skew¬ 
ness and kurtosis measures. 


The limit kernel. By letting the number of temporal scale 
levels K tend to infinity, we can define a limit kernel Fit: r, c) 
via the limit of the Fourier transform ( |33j ) according to (and 
with the indices relabelled to better fit the limit case): 

#(w; r,c)= lim h exp (u\ t,c,K) 

X—>■ oo 

oo ^ 

1 _|_ i c ~ k y/c 2 — 1 s/ruj ) 

By treating this limit kernel as an object by itself, which 
will be well-defined because of the rapid convergence by 
the summation of variances according to a geometric series, 
interesting relations can be expressed between the temporal 
scale-space representations 


/ OO 

F(u; t, c) f(t — u ) du 

.=0 


(39) 


obtained by convolution with this limit kernel. 


3 Concerning the definition of the temporal scale-space representa¬ 
tion f39| obtained by convolution with the limit kernel {38), it should 
be noted that although these definitions formally hold for any values 
of t and c, the information reducing property in terms non-creation 
of new local extrema or zero-crossings from finer to coarser scales 
is only guaranteed to hold if the transformation between two tempo¬ 
ral scale levels T 2 > rj can be written on the form L(-; T 2 ,C 2 ) = 
h(-: (n, ci) 1 -t (T2,C2)) * L (•; n,ci) with /i(-; (n,ci) i-t 

( r 2 i C 2 )) being a temporal scale-space kernel of the form (17}. Such an 
information reducing property is always guaranteed to hold for tempo¬ 
ral scale levels of the form ( |40| with ci = C 2 = c, but does in general 
not hold for arbitrary combinations of (ti , ci) and (t 2 , 02 ). Therefore, 
the definitions {39} and {38} are primarily intended to be applied over 
a discrete set of temporal scale levels of the form {40}. 
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Self-similar recurrence relation for the limit kernel over tem¬ 
poral scales. Using the limit kernel, an infinite number of 
discrete temporal scale levels is implicitly defined given the 
specific choice of one temporal scale r = r 0 : 


TO T 0 T 0 2 4 6 

^6 5 £v4 ’ ffl ’ A)? C Tq , C Tq , C Tq , . . . 


(40) 


Directly from the definition of the limit kernel, we obtain the 
following recurrence relation between adjacent scales: 


&{-, t,c) = h exp (-; *&(■; £,c) 

and in terms of the Fourier transform: 

1 


<F(oj; t,c ) = 


1 + i 


,c). 


(41) 


(42) 


Behaviour under temporal rescaling transformations. From 
the Fourier transform of the limit kernel ( [38] ), we can ob¬ 
serve that for any temporal scaling factor S it holds that 


choice of the distribution parameter c. Based on this desir¬ 
able and highly useful property, it is natural to refer to the 
limit kernel as the scale invariant time-causal limit kernel. 

Applied to the spatio-temporal scale-space representa¬ 
tion defined by convolution with a velocity-adapted affine 
Gaussian kernel g(x — vt\ s, S) over space and the limit 
kernel t ; r, c) over time 


L(x, t; s , r, c; S, v ) = 

[ [ g{v~vC; S,E)&( C; T,c) f(x-rj,t-Qdr]dC, 

JriGR 2 Jc =0 


(48) 


the corresponding spatio-temporal scale-space representa¬ 
tion will then under a scaling transformation of time (x', t') T = 
(. x , cH) T obey the closedness property 

L'(x',t'; s,t',c; E,v') = L[x,t', s,t,c ; E,v) (49) 

with t' = cand v' = v/cf 


S 2 t,c) = t,c). (43) 

Thus, the limit kernel transforms as follows under a scaling 
transformation of the temporal domain: 

Sd r (St- S 2 t,c) = t,c). (44) 

If we for a given choice of distribution parameter c rescale 
the input signal / by a scaling factor S = l/c such that 
t' = t/c, it then follows that the scale-space representation 
of f at temporal scale t' = r/c 2 

L'(t'- £,c) = (*(•; £,c) */'(■))(*'; J,c) (45) 

will be equal to the temporal scale-space representation of 
the original signal / at scale r 

t',c) = L(t; t,c). (46) 

Hence, under a rescaling of the original signal by a scaling 
factor c, a rescaled copy of the temporal scale-space repre¬ 
sentation of the original signal can be found at the next lower 
discrete temporal scale relative to the temporal scale-space 
representation of the original signal. 

Applied recursively, this result implies that the tempo¬ 
ral scale-space representation obtained by convolution with 
the limit kernel obeys a closedness property over all tempo¬ 
ral scaling transformations t' = c J t with temporal rescaling 
factors S = c 0 (j € Z) that are integer powers of the distri¬ 
bution parameter c, 

L'{t t',c ) = L(t; t,c) for t' = (ft and t' = c^V, 

(47) 

allowing for perfect scale invariance over the restricted sub¬ 
set of scaling factors that precisely matches the specific set 
of discrete temporal scale levels that is defined by a specific 


Self-similarity and scale invariance of the limit kernel. Com¬ 
bining the recurrence relations of the limit kernel with its 
transformation property under scaling transformations, it fol¬ 
lows that the limit kernel can be regarded as truly self-similar 
over scale in the sense that: (i) the scale-space representation 
at a coarser temporal scale (here r) can be recursively com¬ 
puted from the scale-space representation at a finer temporal 
scale (here r/c 2 ) according to (41 1 , (ii) the representation 
at the coarser temporal scale is derived from the input in 
a functionally similar way as the representation at the finer 
temporal scale and (iii) the limit kernel and its Fourier trans¬ 
form are transformed in a self-similar way ( |44| and ( |43| > un¬ 
der scaling transformations. 

In these respects, the temporal receptive fields arising 
from temporal derivatives of the limit kernel share struc¬ 
turally similar mathematical properties as continuous wavelets 
(Daubechies lUOl : Heil and Walnut 11301 : Mallat 11711 : Mis- 
iti et al. 1751 ) and fractals (Mandelbrot l72l : Barnsley J5); 
Barnsley and Rising 16]), while with the here conceptually 
novel extension that the scaling behaviour and self-similarity 
over scale is achieved over a time-causal and time-recursive 
temporal domain. 


6 Computational implementation 

The computational model for spatio-temporal receptive fields 
presented here is based on spatio-temporal image data that 
are assumed to be continuous over time. When implement¬ 
ing this model on sampled video data, the continuous theory 
must be transferred to discrete space and discrete time. 

In this section we describe how the temporal and spatio- 
temporal receptive fields can be implemented in terms of 
corresponding discrete scale-space kernels that possess scale- 
space properties over discrete spatio-temporal domains. 
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6.1 Classification of scale-space kernels for discrete signals 

In section |3.2[ we described how the class of continuous 
scale-space kernels over a one-dimensional domain can be 
classified based on classical results by Schoenberg regard¬ 
ing the theory of variation-diminishing transformations as 
applied to the construction of discrete scale-space theory in 
Lindeberg 1451 1 481 section 3.3]. To later map the temporal 
smoothing operation to theoretically well-founded discrete 
scale-space kernels, we shall in this section describe corre¬ 
sponding classification result regarding scale-space kernels 
over a discrete temporal domain. 

Variation diminishing transformations. Let v = (ui, V 2 ,..., v n ) 
be a vector of n real numbers and let V~ ( v ) denote the (min¬ 
imum) number of sign changes obtained in the sequence 
v\, i> 2 , ■ • •, v n if all zero terms are deleted. Then, based on a 
result by Schoenberg lt84l the convolution transformation 

OO 

fout(t) = ^2 Cnfm(t-n ) (50) 

n =—oo 

is variation-diminishing i.e. 

V-(f out )<V~(f in ) (51) 

holds for all /,„ if and only if the generating function of the 
sequence of filter coefficients <p(z) = c nZ n is of 

the form 


To transfer the continuous first-order integrators derived in 
section |3.3| to a discrete implementation, we shall in this 
treatment focus on the first-order recursive filters, which by 
additional normalization constitute both the discrete corre¬ 
spondence and a numerical approximation of time-causal 
and time-recursive first-order temporal integration (15 i. 


6.2 Discrete temporal scale-space kernels based on 
recursive filters 

Given video data that has been sampled by some temporal 
frame rate r, the temporal scale a t in the continuous model 
in units of seconds is first transformed to a variance r rela¬ 
tive to a unit time sampling 

r = r 2 cr 2 (55) 

where r may typically be either 25 fps or 50 fps. Then, a dis¬ 
crete set of intermediate temporal scale levels t/, : is defined 
by ([18]) or m with the difference between successive scale 
levels according to At k = Tk — Tk-i (with To = 0). 

For implementing the temporal smoothing operation be¬ 
tween two such adjacent scale levels (with the lower level in 
each pair of adjacent scales referred to as f. m and the upper 
level as f ou t ), we make use of a first-order recursive filter 
normalized to the form 

fout(t)-fout(t- 1) = tt - 1 — (fin{t)-fout(t- 1)) (56) 
1 + Hk 


<p(z) = c z k e ( «- 12 1+qiz) 


(1 + oiiz){l + SjZ 4 ) 
L* (1 - Piz)( 1 - 7 iZ- 1 ) 


n 


(52) 


where c > 0, k G Z, q-i,qi,a i ,0i,'yi,S i > 0and]G“i(ai+ 
Pi+'li+Sf) < oo. Interpreted over the temporal domain, this 
means that besides trivial rescaling and translation, there are 
three basic classes of discrete smoothing transformations: 


- two-point weighted average or generalized binomial smooth¬ 
ing 

fout(x ) = /,•„( x) + at f in (x - 1) (a* > 0), 

fout{x) = fin(x) + Si f in (x + 1) (Si > 0), 

- moving average or first-order recursive filtering 

fout(x) = fin(x) + A fout(x - 1) (0 < A < 1), 

fout(x) = fin(x) + 7 i fout(x +1) (0 < 7j < 1), 

- infinitesimal smoothin^jor diffusion as arising from the 

continuous semi-groups made possible by the factor 

e (q-iz~ 1 +qiz)' 


4 These kernels correspond to infinitely divisible distributions as can 
be described with the theory of Levy processes (13, where specifically 
the case q_i = qi corresponds to convolution with the non-causal 
discrete analogue of the Gaussian kernel E3 and the case i = 0 to 
convolution with time-causal Poisson kernel OH. 


and having a generating function of the form 

Hgeom(z) = ~ 7 7T 

1 - ll k (z ~ 1) 


(57) 


which is a time-causal kernel and satisfies discrete scale- 
space properties of guaranteeing that the number of local 
extrema or zero-crossings in the signal will not increase with 
increasing scale (Lindeberg {45l ; Lindeberg and Fagerstrom 
ESI). These recursive filters are the discrete analogue of the 
continuous first-order integrators CD- Each primitive recur¬ 
sive filter ( [56| has temporal mean value m r = Hk and tem¬ 
poral variance Ark = i-il + Hk, and we compute //*. from 
Ar f ; according to 


M/c 


y/1 + 4Z\T fc - 1 
2 


(58) 


By the additive property of variances under convolution, the 
discrete variances of the discrete temporal scale-space ker¬ 
nels will perfectly match those of the continuous model, 
whereas the mean values and the temporal delays may dif¬ 
fer somewhat. If the temporal scale r/ c is large relative to the 
temporal sampling density, the discrete model should be a 
good approximation in this respect. 

By the time-recursive formulation of this temporal scale- 
space concept, the computations can be performed based on 
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a compact temporal buffer over time, which contains the 
temporal scale-space representations at temporal scales rfc 
and with no need for storing any additional temporal buffer 
of what has occurred in the past to perform the correspond¬ 
ing temporal operations. 

Concerning the actual implementation of these opera¬ 
tions computationally on signal processing hardware of soft¬ 
ware with built-in support for higher order recursive filter¬ 
ing, one can specifically note the following: If one is only 
interested in the receptive field response at a single temporal 
scale, then one can combine a set of K' first-order recursive 


filters (561 into a higher order recursive filter by multiplying 


their generating functions (57 i 


k ' 


Hcomposed(z') | _. 


1 


fe L = i 1 - fJ-k (z - 1 ) 

1 


a 0 + a\ z + a 2 z 2 + • • • + qk' z k ' 


(59) 


thus performing K' recursive filtering steps by a single call 
to the signal processing hardware or software. If using such 
an approach, it should be noted, however, that depending on 
the internal implementation of this functionality in the sig¬ 


nal processing hardware/software, the composed call (59 1 
may not be as numerically well-conditioned as the individ¬ 
ual smoothing steps ( |56| l which are guaranteed to dampen 
any local perturbations. In our Matlab implementation for 
offline processing of this receptive field model, we have there 
fore limited the number of compositions to K' = 4. 


6.3 Discrete implementation of spatial Gaussian smoothing 

To implement the spatial Gaussian operation on discrete sam¬ 
pled data, we do first transform a spatial scale parameter o x 
in units of e.g. degrees of visual angle to a spatial variance s 
relative to a unit sampling density according to 

s=p 2 al (60) 

where p is the number of pixels per spatial unit e.g. in terms 
of degrees of visual angle at the image center. Then, we con¬ 
volve the image data with the separable two-dimensional 
discrete analogue of the Gaussian kernel (Lindeberg ES) 


T(ni,n 2 ; s) = e 2s I ni (s)I n2 (s), (61) 

where I n denotes the modified Bessel functions of integer 
order and which corresponds to the solution of the semi¬ 
discrete diffusion equation 

d s L(m,n 2 -, s) = -(VlL)(m,n 2 ; s), (62) 

where V§ denotes the five-point discrete Laplacian operator 
defined by (V§/)(m,n 2 ) = /(ni-1, n 2 )+/(ni + l, n 2 ) + 


/( 711,712 - 1) + /(tii, 712 + 1) - 4/(ni,n 2 ). These kernels 
constitute the natural way to define a scale-space concept for 
discrete signals corresponding to the Gaussian scale-space 
over a symmetric domain. 

This operation can be implemented either by explicit 
spatial convolution with spatially truncated kernels 

N N 

E E T(m,7i 2 ; s) > 1 — e (63) 

m——N U2——N 

for small e of the order 10 -8 to 10 5 6 with mirroring at 
the image boundaries (adiabatic boundary conditions corre¬ 
sponding to no heat transfer across the image boundaries) or 
using the closed-form expression of the Fourier transform 

OO OO 

<PtWM = E E r («i,n 2 ; s)e~ i ^ +n ^ 

m=—oo m =—oo 

_ e - 2 t(sin 2 (^)+sin 2 (^))_ ( 54 ) 

Alternatively, to approximate rotational symmetry by higher 
degree of accuracy, one can define the 2-D spatial discrete 
scale-space from the solution of (Lindeberg l48l section 4.3]) 

cU=^((l^ 7 )Vf;L + 7V 2 x2 T), (65) 

where ( V 2 x f)(m,n 2 ) = |(/(ni + l,n 2 + 1 ) + /(m + 
1,712 - 1) + f(tl i - 1,712 + 1) + /(Til -1,712-1)- 
4 /(rii, 7 i 2 )) and specifically the choice 7 = 1/3 gives the 
best approximation of rotational symmetry. In practice, this 
operation can be implemented |/]by first one step of diagonal 
separable discrete smoothing at scale s x = s /6 followed by 
a Cartesian separable discrete smoothing at scale S 5 = 2s/3 
or using a closed form expression for the Fourier transform 
derived from the difference operators 

ip T (0 1 9 2 ) = e -( 2 -'v) t +( 1 - j y)(c°s8i+c°s02) t +('7Cos9 1 cosS 2 )t) 

5 This four step combined diagonal and Cartesian separability 
property can be understood by writing the discr ete Laplacian op¬ 
erator = (1 - 7 )V§ + 7 V ^. 2 in |65) for 7 = 1/3 

as = | (S xx + 5 yy ) + ^(Syyi + 5\^), where 5 XX + 

S yy are the horizontal and vertical difference operators with co¬ 
efficients ( 1 , — 2 , 1 ) while 6 />/■ and are the two possi¬ 

ble diagonal difference operators with coefficients ( 1 / 2 ,— 1 , 1 / 2 ). 
The generating function of the convolution kernel is obtained 
by exponentiating |65[ leading to ip(z,w) = exp(sV^) = 
exp(|si a;x ) exp(| sS yv ) exp(|s<5y,y,) exp^sa*^) with S xx = 
z + z —1 — 2, S yy = w + w —1 — 2, Syp* @ (zw + z~ 1 w~ 1 )/2 — 1 
and = ( zw -1 z~ 1 w)/2 — 1. The expressions exp^iLa,) and 

expGtfyj,) are the generating functions of the regular one-dimensional 
discrete analogue of the Gaussian kernel T(n; s) = e~ a I n (s) along 
the horizontal and vertical Cartesian directions, whereas the expres¬ 
sions exp(s<5/>y) and exp^iik^) are the generating functions cor¬ 
responding to applying the discrete analogue of the Gaussian kernel 
with a different scale parameter T(n; s/2) = e~ s i 2 I n (s/ 2 ) in the 
two possible diagonal directions. The reason why the scale parame¬ 
ter is different in the diagonal directions is because of the larger grid 
spacing in the diagonal vs. the Cartesian directions. 
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( 66 ) 


6.4 Discrete implementation of spatio-temporal receptive 
fields 

For separable spatio-temporal receptive fields, we implement 
the spatio-temporal smoothing operation by separable com¬ 
bination of the spatial and temporal scale-space concepts 
in sections |6.2| and |6.3| From this representation, spatio- 
temporal derivative approximations are then computed from 
difference operators^ 


1 , +1) 

Stt = (1) —2,1) 

(67) 

1 ^ 
2 , ° ,+ 2 ) 

S xx = (1, -2,1) 

(68) 

2 ’° I+ 2^ 

*«, = (!,-2,1) 

(69) 


expressed over the appropriate dimensions and with higher 
order derivative approximations constructed as combinations 
of these primitives, e.g. S X y - S x Sy, S xxx = S x S xx , S xx ± — 
4i St, etc. From the general theory in (Lindeberg 1461148ft ) it 
follows that the scale-space properties for the original zero- 
order signal will be transferred to such derivative approx¬ 
imations, including a true cascade smoothing property for 
the spatio-temporal discrete derivative approximations 


l J X™ 1 X™ 2 t n ^2? 52,T/j; 2 ) 

= ((2\v; s 2 - Si) r kl r fc2 )) * 

(■>•>•; s i> T fei)) (xi,x 2 ,f; s 2 ,T fc2 ). (70) 


6 Note that the below purely one-dimensional spatial derivative ap¬ 
proximation operators are primarily intended to be used in connection 
with the separable discrete spatial scale-space concept ]6 1 | W hen us¬ 
ing the non-separable discrete spatial scale-space conceptT65) that en¬ 
ables better numerical approximation to rotational invariance, it can be 
motivated to also use two-dimensional discrete derivative approxima¬ 
tion operators (Lindeberg 1481 section 53.3.2]). 


The motivation for using symmetricj^differences for the first 
order spatial derivative approximations 5 X and S y is to have 
the derivative approximations maximally accurate at the grid 
points to enable straightforward combination into higher or¬ 
der differential invariants over image space. With this choice 
also certain algebraic relations that hold for derivatives of 
continuous Gaussian kernels will be transferred to corre¬ 
sponding algebraic relations for difference approximations 
applied to the discrete Gaussian kernel (Lindeberg |481 equa¬ 
tions (5.34) and (5.36) at page 133]): 

(S x T)(x; t) = T(x; t ) (71) 

(S xx T)(x; t ) = 2 (d t T)(x-, t ) (72) 

The motivation for using non-symmetric first-order deriva¬ 
tive approximations (— 1 , 1 ) over time is because of the tem¬ 
poral causality that implies the impossibility of having ac¬ 
cess to data from the future and then within this constraint 
minimize the temporal delay as much as possible using tem¬ 
poral difference operators of minimum support. Because of 
the non-causal temporal smoothing operation, one anyway 
gets an additional and much larger temporal delay that im¬ 
plies that all filter responses are computed with a certain and 
non-neglible temporal delay. 

For non-separable spatio-temporal receptive fields cor¬ 
responding to a non-zero image velocity v = {v\,V 2 ) T , 
we implement the spatio-temporal smoothing operation by 
first warping the video data (x' 1 ,x' 2 ) T = (xi — vff,X 2 — 
V2t) T using spline interpolation. Then, we apply separable 
spatio-temporal smoothing in the transformed domain and 
unwarp the result back to the original domain. Over a contin¬ 
uous domain, such an operation is equivalent to convolution 
with corresponding velocity-adapted spatio-temporal recep¬ 
tive fields, while being significantly faster in a discrete im¬ 
plementation than explicit convolution with non-separable 
receptive fields over three dimensions. 

7 It should be noted, however, that as a side effect of this choice 
of a symmetric first-order derivative approximation, the second order 
difference operator 5 XX will not be equal to the first-order difference 
operator applied twice S x S x ^ 5 XX . Since the symmetric first-order 
difference operator (—1/2, 0 , 1/2) corresponds to result of smoothing 
the tighter and non-symmetric difference operator (—1, 1) with the bi¬ 
nomial kernel (1/2,1/2), the symmetric first-order derivative approx¬ 
imation could be seen as computed at a slightly coarser scale As = 
1/4 compared to second-order derivative approximation obtained by 
the second-order difference operator (1, — 2 ,1) corresponding to the 
tighter first-order difference (—1, 1) applied twice. If that would be 
regarded as a problem, one could try to compensate for this effect 
by smoothing the second-order derivative kernels with the symmet¬ 
ric generalized binomial kernel (As/2, 1 — As, As/2) for As = 1 /4, 
however, then at the cost of destroying the relation \12\ between the 
second-order derivative approximations and derivatives with respect to 
scale. The generalized binomial kernel (As/2, 1 — As, As/2) is for 
0 < As <1/2 also a discrete scale-space kernel and has variance As 
(Lindeberg 1351 sections 3.2.2 and 3.6.2]). 
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7 Scale normalization for spatio-temporal derivatives can determine a temporal scale normalization factor a n>lT (r) 


When computing spatio-temporal derivatives at different scales, 
some mechanism is needed for normalizing the derivatives 
with respect to the spatial and temporal scales, to make deriva¬ 
tives at different spatial and temporal scales comparable and 
to enable spatial and temporal scale selection. 


7.1 Scale normalization of spatial derivatives 

For the Gaussian scale-space concept defined over a purely 
spatial domain, it can be shown that the canonical way of 
defining scale-normalized derivatives at different spatial scales 
s is according to (Lindeberg (53)) 

%=W 2 5 Xl , d h =s^/ 2 d X2 , (73) 

where 7 S is a free parameter. Specifically, it can be shown 
(Lindeberg (531 section 9.1]) that this notion of 7 -normalized 
derivatives corresponds to normalizing the m:th order Gaus¬ 
sian derivatives g^„. = in iV-dimensional image 

space to constant L p -norms over scale 


s ) lip = 



\ 1 Ip 

\g^{x; s)\ p dxj = G m , 7o 


(74) 


with 

P= — 

1 + 


1 


(75) 


where the perfectly scale invariant case 7 S = 1 corresponds 
to L \-normalization for all orders |m| = mi + • • • + rriM- 
In this paper, we will throughout use this approach for nor¬ 
malizing spatial differentiation operators with respect to the 
spatial scale parameter s. 


7.2 Scale normalization of temporal derivatives 

If using a non-causal Gaussian temporal scale-space con¬ 
cept, scale-normalized temporal derivatives can be defined 
in an analogous way as scale-normalized spatial derivatives 
as described in the previous section. 

For the time-causal temporal scale-space concept based 
on first-order temporal integrators coupled in cascade, we 
can also define a corresponding notion of scale-normalized 
temporal derivatives 

<9 C „ = T n ^ /2 dt~ (76) 

which will be referred to as variance-based normalization 
reflecting the fact the parameter r corresponds to variance of 
the composed temporal smoothing kernel. Alternatively, we 


<V = “n,7 T ( T ) dt" (77) 

such that the L p -norm (with p determined as function of 
7 according to ( f75] >) of the corresponding composed scale- 
normalized temporal derivative computation kernel a nj7T (r) h t 
equals the L p - norm of some other reference kernel, where 
we here initially take the L p -norm of the corresponding Gaus¬ 
sian derivative kernels 

It)|| p = Q!„ i 7 t (t) \\h t n{-', r)||p, 

= t)\\ p = G„ )7t . (78) 

This latter approach will be referred to as L p -normalization^ 
For the discrete temporal scale-space concept over dis¬ 
crete time, scale normalization factors for discrete Z /r normal¬ 
ization are defined in an analogous way with the only differ¬ 
ence that the continuous L p - norm is replaced by a discrete 
Zp-norm. 

In the specific case when the temporal scale-space repre¬ 
sentation is defined by convolution with the scale-invariant 
time-causal limit kernel according to ( [39} and ([38}, it is 
shown in appendix[C]that the corresponding scale-normalized 
derivatives become truly scale covariant under temporal scal¬ 
ing transformations t' = cH with scaling factors S = c * 1 that 
are integer powers of the distribution parameter c 

r', c) = Ljn(f; r, c) 

= c J ' (1 ~ 1/p) L f .(f; T ,c) (79) 

between matching temporal scale levels t' = c 2:1 t. Specifi¬ 
cally, for 7 = 1 corresponding top = 1 the scale-normalized 
temporal derivatives become fully scale invariant 

t',c ) = L^(t; t,c). (80) 


7.3 Computation of temporal scale normalization factors 
For computing the temporal scale normalization factors 

Ilse-O; t)|| p 


a. 


.(t) = 


I *•*"(•; 


(81) 


in (77 1 for L p -normalization according to (78 1 , we compute 
the L p -norms of the scale-normalized Gaussian derivatives. 


8 These definitions generalize the previously defined notions of L v - 
normalization and variance-based normalization over discrete scale- 
space representation in (Lindeberg (551 ) and pyramids in (Lindeberg 
and Bretzner [65}) to temporal scale-space representations. 
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Temporal scale normalization factors for n 

= 1 at t = 1 


I< 

r n/2 

a», 7 T (r) (uni) 

& n ,' y T (t) (c — y/2) Oin ,- y T 

(r) (c = 2 3/4 ) 

“«,7t ( t ) ( c = 2 ) 

2 

1.000 

0.744 

0.744 

0.737 

0.723 

3 

1.000 

0.805 

0.794 

0.765 

0.736 

4 

1.000 

0.847 

0.814 

0.771 

0.737 

5 

1.000 

0.877 

0.821 

0.772 

0.738 

6 

1.000 

0.901 

0.823 

0.772 

0.738 

7 

1.000 

0.920 

0.823 

0.772 

0.738 

8 

1.000 

0.935 

0.823 

0.772 

0.738 

16 

1.000 

0.998 

0.823 

0.772 

0.738 




Temporal scale normalization factors for n 

= 1 at r = 16 


K 

T «/ 2 

a„, lT (r) (uni) 

(t ) (c — y/2) 

(r) ( C = 2 3 / 4 ) 

a n,-y T (r) (c = 2) 

2 

4.000 

3.056 

3.056 

3.016 

2.938 

3 

4.000 

3.398 

3.341 

3.210 

3.041 

4 

4.000 

3.553 

3.432 

3.223 

3.068 

5 

4.000 

3.642 

3.442 

3.227 

3.071 

6 

4.000 

3.731 

3.452 

3.228 

3.071 

7 

4.000 

3.744 

3.457 

3.228 

3.071 

8 

4.000 

3.809 

3.459 

3.228 

3.071 

16 

4.000 

3.891 

3.460 

3.338 

3.071 




Temporal scale normalization factors for n = 

= 1 at r = 256 


K 

T «/ 2 

a„, 7r (r) (uni) 

&n,~/ T (r) (c = y/2) a n , 7r 

(r) (c = 2 3 / 4 ) 

ati,7r ( r ) (c = 2) 

2 

16.000 

12.270 

12.270 

12.084 

11.711 

3 

16.000 

13.612 

13.420 

12.835 

12.147 

4 

16.000 

14.242 

13.732 

12.932 

12.162 

5 

16.000 

14.610 

13.815 

12.930 

12.155 

6 

16.000 

14.850 

13.816 

12.927 

12.152 

7 

16.000 

15.018 

13.817 

12.922 

12.151 

8 

16.000 

15.145 

13.817 

12.922 

12.151 

16 

16.000 

15.583 

13.816 

12.922 

12.151 


Table 3 Numerical values of scale normalization factors for discrete temporal derivative approximations , using either variance-based normaliza¬ 
tion r”/ 2 or -normalization a n ,-y T (t), for temporal derivatives of order n = 1 and at temporal scales r = 1, r = 16 and r = 256 relative to a 
unit temporal sampling rate with At = 1 and with y T = 1, for time-causal kernels obtained by coupling I\ first-order recursive filters in cascade 
with either a uniform distribution of the intermediate scale levels or a logarithmic distribution for c = \/2, c = 2 3 / 4 and c = 2. 


from closed-form expressions if 7 = 1 (corresponding to 
P= 1) 


Gi,i 

G 2j1 

G 3> 1 



I gd u ', t)\ d u 

I g?{u; t)\du 


7=1 


7=1 


0.797885, (82) 

« 0.967883, 

(83) 


\g^ 0; 


7=1 



1.51003, 


(84) 


( 85 ) 



or for values of 7 / 1 by numerical integration. For com¬ 
puting the discrete Z p -norm of discrete temporal derivative 
approximations, we first (i) filter a discrete delta function by 
the corresponding cascade of first-order integrators to ob¬ 
tain the temporal smoothing kernel and then (ii) apply dis¬ 
crete derivative approximation operators to this kernel to ob¬ 
tain the corresponding equivalent temporal derivative ker¬ 
nel, (iii) from which the discrete /,,-norm is computed by 
straightforward summation. 
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Temporal scale normalization factors for n 

= 2 at r = 1 


K 

r n/2 

OLn , 7t (t) (uni) 

^n,7 T (?") (c — y/2) CXn,-yT 

(r) (c = 2 3 / 4 ) 

OL n , lT (r) (c = 2) 

2 

1.000 

0.617 

0.617 

0.606 

0.586 

3 

1.000 

0.711 

0.694 

0.649 

0.607 

4 

1.000 

0.738 

0.718 

0.659 

0.609 

5 

1.000 

0.755 

0.721 

0.660 

0.609 

6 

1.000 

0.768 

0.722 

0.660 

0.609 

7 

1.000 

0.779 

0.722 

0.660 

0.609 

8 

1.000 

0.787 

0.722 

0.660 

0.609 

16 

1.000 

0.824 

0.722 

0.660 

0.609 




Temporal scale normalization factors for 

n = 2 at r = 16 


K 

r «/ 2 

an, 7t . (r) (uni) 

&n,'y T (‘7’) (c — \/2) CX-n. 

, 7 r(r) (c = 2 3 / 4 ) 

Q!n,7 T ("T") (c — 2) 

2 

16.000 

4.622 

4.622 

4.472 

4.172 

3 

16.000 

8.429 

8.017 

6.897 

5.701 

4 

16.000 

10.184 

9.160 

7.885 

6.208 

5 

16.000 

11.363 

9.698 

7.871 

6.296 

6 

16.000 

12.241 

10.022 

7.864 

6.305 

7 

16.000 

12.690 

10.088 

7.862 

6.305 

8 

16.000 

13.106 

10.068 

7.862 

6.305 

16 

16.000 

14.575 

10.058 

7.862 

6.305 




Temporal scale normalization factors for n - 

= 2 at r = 256 


K 

r n/2 

d n , lT (r) (uni) 

Qn, 7 T (r) (c = y/2) a„ il7 

(r) (c = 2 3 / 4 ) 

C^n,7 T (l~) (c — 2) 

2 

256.00 

58.95 

58.95 

56.63 

51.84 

3 

256.00 

133.37 

127.68 

112.66 

94.71 

4 

256.00 

165.14 

148.96 

124.04 

101.16 

5 

256.00 

183.75 

156.04 

126.42 

101.13 

6 

256.00 

195.99 

158.69 

126.65 

101.12 

7 

256.00 

204.71 

159.17 

126.56 

101.12 

8 

256.00 

211.10 

159.23 

126.55 

101.12 

16 

256.00 

233.78 

159.28 

126.55 

101.12 


Table 4 Numerical values of scale normalization factors for discrete temporal derivative approximations, for either variance-based normalization 
T n ' 2 or ip-normalization a n> ^ T (r), for temporal derivatives of order n = 2 and at temporal scales r = 1, r = 16 and r = 256 relative to a unit 
temporal sampling rate with At = 1 and with j T = 1, for time-causal kernels obtained by coupling K first-order recursive filters in cascade with 
either a uniform distribution of the intermediate scale levels or a logarithmic distribution for c = \/2, c = 2 3 ' 4 and c = 2. 


To illustrate how the choice of temporal scale normaliza¬ 
tion method may affect the results in a discrete implementa¬ 
tion, tables [3jj4]show examples of temporal scale normaliza¬ 
tion factors computed in these ways by either (i) variance- 
based normalization r"/ 2 according to (76 1 or (ii) L p -norm¬ 
alization a„ i7T (r) according to (77 i—(78 i for different or¬ 
ders of temporal temporal differentiation n, different distri¬ 
bution parameters c and at different temporal scales t, rel¬ 
ative to a unit temporal sampling rate. The value c = y/2 
corresponds to a natural minimum value of the distribution 
parameter from the constraint p, 2 > p.\, the value c = 2 to a 
doubling scale sampling strategy as used in a regular spatial 
pyramids and c = 2 3 ' 4 to a natural intermediate value be¬ 
tween these two. The temporal scale level r = 1 is near the 
discrete temporal sampling rate where temporal discretiza¬ 


tion effects are strong, r = 16 is a higher temporal scale 
where temporal sampling effects are moderate and r = 256 
corresponds to a temporal scale much higher than discrete 
temporal sampling distance and the temporal sampling ef¬ 
fects therefore can be expected to be small. 

Notably, the numerical values of the resulting scale nor¬ 
malization factors may differ substantially depending on the 
type of scale normalization method and the underlying num¬ 
ber of first-order recursive filters that are coupled in cas¬ 
cade. Therefore, the choice of temporal scale normalization 
method warrants specific attention in applications where the 
relations between numerical values of temporal derivatives 
at different temporal scales may have critical influence. 

Specifically, we can note that the temporal scale normal¬ 
ization factors based on /^-normalization differ more from 
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Relative deviation from limit of scale normalization factors for n = 1 at r = 256 


K 

£ n (uni) 

En (C 

= V2) 

En (c 

= 2 3 ' 4 ) 

£n (C = 

2 ) 

2 

0.233 

1.1 • 

lO" 1 

6.5 

fcf 2 

3.6 

• 10 

-2 

4 

0.110 

6.1 ■ 

10 -3 

8.5 

■ 10“ 4 

8.6 

• 10' 

-4 

8 

0.053 

4.9- 

10- 4 

1.1 

■ 10- 5 

2.0 

■ 10' 

-7 

16 

0.026 

1.2- 

10- 7 

9.0- 

10- 13 

1.5- 

10 “ 

-15 

32 

0.013 

3.1 • 

10- 14 

2.9- 

io- 14 

3.4- 

io- 

-14 


Relative deviation from limit of scale normalization factors for n = 2 at r = 256 


K 

En (uni) 

En (c 

= V2) 

En (C 

= 2 3 / 4 ) 

En (c = 

2) 

2 

0.770 

6.3- 

10" 1 

5.5 

■ 10" 1 

4.9 

■ 10 

-1 

4 

0.354 

6.5- 

10" 2 

2.0 

■ 10" 2 

4.1 

• 10' 

-2 

8 

0.174 

3.2- 

10" 4 

1.3 

■ 10 _s 

1.6 

• 10' 

-8 

16 

0.085 

1.8- 

io- 7 

1.0- 

IO" 12 

9.6- 

10“ 

-15 

32 

0.042 

1.2 ■ 

io- 13 

6.2- 

10" 14 

4.0- 

10“ 

-14 


Table 5 Numerical estimates of the relative deviation from the limit case when using different numbers K of temporal scale levels for a uniform 
Vi. a logarithmic distribution of the intermediate scale levels. The deviation measure e„ according to equation ( |87[ l measures the relative deviation 
of the scale normalization factors when using a finite number K of temporal scale levels compared to the limit case when the number of temporal 
scale levels K tends to infinity. (These estimates have been computed at a coarse temporal scale r = 256 relative to a unit grid spacing so that the 
influence of discretization effects should be small. The limit case has been approximated by K = 1000 for the uniform distribution and K = 500 
for the logarithmic distribution.) 


the scale normalization factors from variance-based normal¬ 
ization (i) in the case of a logarithmic distribution of the 
intermediate temporal scale levels compared to a uniform 
distribution, (ii) when the distribution parameter c increases 
within the family of temporal receptive fields based on a log¬ 
arithmic distribution of the intermediate scale levels or (iii) a 
very low number of recursive filters are coupled in cascade. 
In all three cases, the resulting temporal smoothing kernels 
become more asymmetric and do hence differ more from the 
symmetric Gaussian model. 

On the other hand, with increasing values of K the nu¬ 
merical values of the scale normalization factors converge 
much faster to their limit values when using a logarithmic 
distribution of the intermediate scale levels compared to us¬ 
ing a uniform distribution. Depending on the value of the 
distribution parameter c, the scale normalization factors do 
reasonably well approach their limit values after K = 4 to 
K = 8 scale levels, whereas much larger values of K would 
be needed if using a uniform distribution. The convergence 
rate is faster for larger values of c. 

7.4 Measuring the deviation from the scale-invariant 
time-causal limit kernel 

To quantify how good an approximation a time-causal ker¬ 
nel with a finite number of I\ scale levels is to the limit 
case when the number of scale levels K tends to infinity, let 
us measure the relative deviation of the scale normalization 
factors from the limit kernel according to 

(t) = . (87) 


TableOshows numerical estimates of this relative deviation 
measure for different values of K from K 2 to K = 32 
for the time-causal kernels obtained from a uniform vs. a 
logarithmic distribution of the scale values. From the table, 
we can first note that the convergence rate with increasing 
values of K is significantly faster when using a logarithmic 
vs. a uniform distribution of the intermediate scale levels. 

Not even K = 32 scale levels is sufficient to drive the 
relative deviation measure below 1 % for a uniform distri¬ 
bution, whereas the corresponding deviation measures are 
down to machine precision when using K = 32 levels for 
a logarithmic distribution. When using K = 4 scale levels, 
the relative derivation measure is down to 10 -2 to 10 -4 for 
a logarithmic distribution. If using I\ = 8 scale levels, the 
relative deviation measure is down to 10 1 to 10 -8 depend¬ 
ing on the value of the distribution parameter c and the order 
n of differentiation. 

From these results, we can conclude that one should not 
use a too low number of recursive filters that are coupled in 
cascade when computing temporal derivatives. Our recom¬ 
mendation is to use a logarithmic distribution with a min¬ 
imum of four recursive filters for derivatives up to order 
two at finer scales and a larger number of recursive filters 
at coarser scales. When performing computations at a single 
temporal scale, we often use K = 7 or K = 8 as default. 

8 Spatio-temporal feature detection 

In the following, we shall apply the above theoretical frame¬ 
work for separable time-causal spatio-temporal receptive fields 
for computing different types of spatio-temporal feature, de¬ 
fined from spatio-temporal derivatives of different spatial 
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and temporal orders, which may additionally be combined 
into composed (linear or non-linear) differential expressions. 


8.1 Partial derivatives 

A most basic approach is to first define a spatio-temporal 
scale-space representation L : l 2 xKx R + x R + from any 
video data / : l 2 x 1 and then defining partial derivatives 
of any spatial and temporal orders m = (mi, m 2 ) and n at 
any spatial and temporal scales s and r according to 

a '2j f j S,t) 

((ff(v! s)h(-; r)) * /(•, ■, •)) (x\,X2,t; s,t) 

( 88 ) 

leading to a spatio-temporal iV-jet representation of any or¬ 
der 

{-^x 7 Ly , Ll , Lxx 7 Lxyi Lyy , L X t 7 Lyt , L>tt 7 * * * }• (89) 

Figure [5] shows such kernels up to order two in the case of a 
1 + 1-D space-time. 


- T(x,t ; s,t) 


T x (x,t ; s,t) T t (x,t\ s,r,S) 



T xx (x,t\ s,t) T x t(x, t\ s,t) T u (x,t-, s,t) 



Fig. 5 Space-time separable kernels T x m t n(x,t\ s,r) = 
d x m t rs) h(t; r)) up to order two obtained as the composi¬ 
tion of Gaussian kernels over the spatial domain x and a cascade of 
truncated exponential kernels over the temporal domain t with a loga¬ 
rithmic distribution of the intermediate temporal scale levels (s =s 1, 
r = 1, K = 7, c = \/2). (Horizontal axis: space x. Vertical axis: time 
t.) 


8.2 Directional derivatives 


- T(x,t ; s,r,v) 


By combining spatial directional derivative operators over 
any pair of ortogonal directions d v = cos tp d x + sin <p d y 
and d± lf> = sin p d x — cos tp d y and velocity-adapted tem¬ 
poral derivatives d fv = dt + v x d x + v y d y over any motion 
direction v = (v x , v y . 1), a filter bank of spatio-temporal 
derivative responses can be created 


L 


</? m l A-(p rn '2t\ 


= d^d^dl 


L 


(90) 


for different sampling strategies over image orientations tp 
and _L tp in image space and over motion directions v in 
space-time (see figure [6] for illustrations of such kernels up 
to order two in the case of a 1 + 1-D space-time). 

Note that as long as the spatio-temporal smoothing oper¬ 
ations are performed based on rotationally symmetric Gaus- 
sians over the spatial domain and using space-time separable 
kernels over space-time, the responses to these directional 
derivative operators can be directly related to corresponding 
partial derivative operators by mere linear combinations. If 
extending the rotationally symmetric Gaussian scale-space 
concept is to an anisotropic affine Gaussian scale-space and/or 
if we make use of non-separable velocity-adapted receptive 
fields over space-time in a spatio-temporal scale space, to 
enable true affine and/or Galilean invariances, such linear re¬ 
lationships will, however, no longer hold on a similar form. 



Fig. 6 Velocity-adapted spatio-temporal kernels 

T x m. t n(x,t-, s,T,v) = d x m t n (g(x — vt; s) h(t.\ r)) up to or¬ 
der two obtained as the composition of Gaussian kernels over the 
spatial domain x and a cascade of truncated exponential kernels 
over the temporal domain t with a logarithmic distribution of the 
intermediate temporal scale levels (s = l,r = 1, K = 7, c = x/2, 
v = 0.5). (Horizontal axis: space x. Vertical axis: time t.) 
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For the image orientations ip and Lip, it is for purely 
spatial derivative operations, in the case of rotationally sym¬ 
metric smoothing over the spatial domain, in principle suffi¬ 
cient to to sample the image orientation according to a uni¬ 
form distribution on the semi-circle using at least \m\ + 1 
directional derivative filters for derivatives of order \m\. 

For temporal directional derivative operators to make 
fully sense in a geometrically meaningful manner (covari¬ 
ance under Galilean transformations of space-time), they should 
however also be combined with Galilean velocity adaptation 
of the spatio-temporal smoothing operation in a correspond¬ 
ing direction v according to <[T]» (Lindeberg 15111561 : Laptev 
and Lindeberg 1441142 1). Regarding the distribution of such 
motion directions v = (v x ,v y ), it is natural to distribute the 
magnitudes w[ = ^Jv x + v y according to a self-similar dis¬ 
tribution 

\v\j = Ml o’ j = l...J (91) 

for some suitably selected constant p > 1 and using a uni¬ 
form distribution of the motion directions e v = v/\v\ on the 
full circle. 


8.3 Differential invariants over spatial derivative operators 


Over the spatial domain, we will in this treatment make use 
of the gradient magnitude | V( X) 1 / )Z/|, the Laplacian V 2 f . yj L, 
the determinant of the Hessian det 'H^ XtV \L, the rescaled 
level curve curvature k(L) and the quasi quadrature energy 
measure Qt x , y )L, which are transformed to scale-normalized 
differential expressions with 7 = 1 (Lindeberg I l48ll53ll55l ): 


1 ^ (cc,y), normal 

V 2 T 

v (x,y)^norrri 1 ^ 


det 'l~L(x,y),n 


rL = 


*(£) = 


\Jsl^x “1“ S ^- J ‘y (x,y)L\, 

(92) 

S ( L xx + Lyy) = s Vf x>y) L, 

(93) 

S ~ {L XX Lyy — L xy ) 


S d-Gt l~i^ X y^l/^ 

(94) 

s (l^xLyy “b LyL X X 2L x L y L. 

*y) 

Co 

10 

a* 

(95) 




,norm 


L = 


s(Ll+L 2 y ) 

+ Cs 2 (L xx + 2 L xy + L 2 y ) , 


(96) 


(and the corresponding unnormalized expressions are ob¬ 
tained by replacing s by 1 )|^J For mixing first- and second- 
order derivatives in the quasi quadrature entity Q( x , y ) in ormL, 
we use C = 2/3 or C = e/4 according to (Lindeberg 1521 ). 

9 When using the Laplacian operator in this paper, the notation 

V? , should be understood as the covariant expression V? , = 

(x,y) r (.&, y) 

with V ( x ,y) = (d x ,d y ) T ,etc. 


8.4 Space-time coupled spatio-temporal derivative 
expressions 

A more general approach to spatio-temporal feature detec¬ 
tion than partial derivatives or directional derivatives con¬ 
sists of defining spatio-temporal derivative operators that com¬ 
bine spatial and temporal derivative operators in an inte¬ 
grated manner. 

Temporal derivatives of the spatial Laplacian. Inspired by 
the way neurons in the lateral geniculate nucleus (LGN) re¬ 
spond to visual input (DeAngelis et al 112111 II ). which for 
many LGN cells can be modelled by idealized operations of 
the form (Lindeberg (57} equation (108)]) 

h LG N{x,y,t ; s,r) = ±(d xx +d yy ) g(x,y; s) h(t; r), 

(97) 


we can define the following differential entities 

( X ,y)L) = L XX t + Lyyt (98) 

MV 2 (x ,y)L) = Lxxtt H” Lyytt (99) 

and combine these entities into a quasi quadrature measure 
over time of the form 

( 100 ) 

where C again may be set to C = 2/3 or C = e/4. The first 
entity <9 t (V 2 a , y ^L) can be expected to give strong respon- 
des to spatial blob responses whose intensity values vary 
over time, whereas the second entity dtt(V 2 x y ^L) can be 
expected to give strong responses to spatial blob responses 
whose intensity values vary strongly around local minima or 
local maxima over time. 

By combining these two entities into a quasi quadrature 
measure Qt(V 2 x y ^L) over time, we obtain a differential en¬ 
tity that can be expected to give strong responses when then 
the intensity varies strongly over both image space and over 
time, while giving no response if there are no intensity vari¬ 
ations over space or time. Hence, these three differential op¬ 
erators could be regarded as a primitive spatio-temporal in¬ 
terest operators that can be seen as compatible with existing 
knowledge about neural processes in the LGN. 


Temporal derivatives of the determinant of the spatial Hes¬ 
sian. Inspired by the way local extrema of the determinant 
of the spatial Hessian (94) can be shown to constitute a bet¬ 
ter interest point detector than local extrema of the spatial 
Laplacian ( |93] > (Lindeberg 16011611 ). we can compute corre¬ 
sponding first- and second-order derivatives over time of the 
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determinant of the spatial Hessian 

dt(detH( x ,y)L) = L'XXtLyy “}~ L X xLyyt 2 L X yL X yt 

( 101 ) 

duidetn^x^L) = LxxttLyy + 2L XX tLyyt L xx Lyytt 

— 2L\y t — 2L xy L xy tt ( 102 ) 

and combine these entities into a quasi quadrature measure 
over time 

Qt{detU^ x , v )L) = 

(&(det n {x , v) L)) 2 + C (a„(det U^ v) L)) 2 . (103) 

As the determinant of the spatial Hessian can be expected to 
give strong responses when there are strong intensity vari¬ 
ations in two spatial directions, the corresponding spatio- 
temporal operator Q t {detT-Li xtV )L) can be expected to give 
strong responses at such spatial points at which there are ad¬ 
ditionally strong intensity variations over time as well. 


Genuinely spatio-temporal interest operators. A less tem¬ 
poral slice oriented and more genuine 3-D spatio-temporal 
approach to defining interest point detectors from second- 
order spatio-temporal derivatives is by considering feature 
detectors such as the determinant of the spatio-temporal Hes¬ 
sian matrix 


det fL^ xy t)L — dj xx LyyLtt 3“ 2L/ X yL x tLyt 

—L xx L yt — L yy L xt — LttL xy , (104) 

the rescaled spatio-temporal Gaussian curvature 

G(x,y,t) {L) 

= {(Lt(L xx L t 2 L x L x f) L x Ltt )x 

(Lt(LyyLt 2 LyLyt ) LyL^f) 

(-^t( dj X Lyl ~\~ L X yLl L X tL/y ) L X LyLtt) ) 

(105) 


which can be seen as a 3-D correspondence of the 2-D rescaled 
level curve curvature operator K norm (L) in equation (951, or 
possibly trying to define a spatio-temporal Laplacian 


^fx,y,t)L — Lxx + Lyy + X 2 Ltt. (106) 

Detection of local extrema of the determinant of the spatio- 
temporal Hessian has been proposed as a spatio-temporal 
interest point detector by (Willems et al. Eg)). Properties 
of the 3-D rescaled Gaussian curvature have been studied in 
(Lindeberg l60l ). 

If aiming at defining a spatio-temporal analogue of the 
Laplacian operator, one does, however, need to consider that 
the most straightforward way of defining such an operator 
X7'f x y t )L = L xx + L yy + L t t is not covariant under inde¬ 
pendent scaling of the spatial and temporal coordinates as 


occurs if observing the same scene with cameras having in¬ 
dependently different spatial and temporal sampling rates. 
Therefore, the choice of the relative weighting factor x 2 be¬ 
tween temporal vs. spatial derivatives introduced in equa¬ 
tion ( 106| ) is in principle arbitrary. By the homogeneity of 
the determinant of the Hessian ( 104| > and the spatio-temporal 
Gaussian curvature (105i in terms of the orders of spatial 
vs. temporal differentiation that are multiplied in each term, 
these expressions are on the other hand truly covariant un¬ 
der independent rescalings of the spatial and temporal co¬ 
ordinates and therefore better candidates for being used as 
spatio-temporal interest operators, unless the relative scal¬ 
ing and weighting of temporal vs. spatial coordinates can be 
handled by some complementary mechanism. 


Spatio-temporal quasi quadrature entities. Inspired by the 


way the spatial quasi quadrature measure Q[ X}V )L in (96 1 is 


defined as a measure of the amount of information in first- 
and second-order spatial derivatives, we may consider dif¬ 
ferent types of spatio-temporal extensions of this entity 


,v,t)L = L 2 + L 2 + x 2 L 2 - 

+ C (L 2 X + 2 L 2 xy + Lyy 

d-x 2 (L 2 t + L 2 t ) + x i Lf) , (107) 

,y,t)L — QtL x Q^ Xj y^L 
= (L 2 + CL \ t ) x 

(L 2 x + Ly + C ( L 2 XX + 2 L 2 xy 


-Ll 


\L = 


^(x,2/)-^£ H"~ C Q(x,y)Ltt 
L x t + L 2 yt + C (L 2 xxt + 2 L 2 xyt + L yyt ) 

+ C {^xtt + ^ytt 


(108) 


= L 2 


where in the first expression when needed because of differ¬ 
ent dimensionalities in terms of spatial vs. temporal deriva¬ 
tives, a free parameter x has been included to adapt the dif¬ 
ferential expressions to unknown relative scaling and thus 
weighting between the temporal vs. spatial dimensions 
The formulation of these quasi quadrature entities is in¬ 
spired by the existence of non-linear complex cells in the 

10 To make the differential entities in equations jl07| , |l08| and 
(T09) fully consistent and meaningful, they do additionally have to be 
transformed into scale-normalized derivatives as later done in equa¬ 
tions (TT2) , m and )1 14) . With scale-normalized derivatives for 
7 = 1 , the resulting scale-normalized derivatives then become dimen¬ 
sionless, which makes it possible to add first- and second-order deriva¬ 
tives of the same variable (over either space or time) in a scale-invariant 
manner. Then, similar arguments as are used for deriving the blending 
parameter C between first- and second-order temporal derivatives in 
(Lindeberg (52)) can be used for deriving a similar blending parameter 
between first- and second-order spatial derivatives. 
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primary visual cortex that: (i) do not obey the superposi¬ 
tion principle, (ii) have response properties independent of 
the polarity of the stimuli and (iii) are rather insensitive to 
the phase of the visual stimuli (Hubei and Wiesel 0111321 ). 
Specifically, De Valois etal. j92l show that first- and second- 
order receptive fields typically occur in pairs that can be 
modelled as approximate Hilbert pairs. 

Within the framework of the presented spatio-temporal 
scale-space concept, it is interesting to note that non-linear 
receptive fields with qualitatively similar properties can be 
constructed by squaring first- and second-order derivative 
responses and summing up these components (Koenderink 
and van Doom |40j). The use of quasi quadrature model can 
therefore be interpreted as a Gaussian derivative based ana¬ 
logue of energy models as proposed by Adelson and Bergen 
in and Heeger l29l . To obtain local phase independence 
over variations over both space and time simultaneously, 
we do here additionally extend the notion of quasi quadra¬ 
ture to composed space-time, by simultaneously summing 
up squares of odd and even filter responses over both space 
and time, leading to quadruples or octuples of filter responses, 
complemented by additional terms to achieve rotational in¬ 
variance over the spatial domain. 

For the first quasi quadrature entity Qit x ,y,t)L to re¬ 
spond, it is sufficient if there are intensity variations in the 
image data either over space or over time. For the second 
quasi quadrature entity 0.2,{x, y ,t)L to respond, it is on the 
other hand necessary that there are intensity variations in 
the image data over both space and time. For the third quasi 
quadrature entity 03 ,( x , y ,t)L to respond, it is also neces¬ 
sary that there are intensity variations in the image data over 
both space and time. Additionally, the third quasi quadra¬ 
ture entity 03 ,( x ,y,t)L requires there to be intensity varia¬ 
tions over both space and time for each primitive receptive 
field in terms of plain partial derivatives that contributes to 
the output of the composed quadrature entity. Conceptually, 
the third quasi quadrature entity can therefore be seen as 
more related to the form of temporal quasi quadrature entity 
applied to the idealized model of LGN cells in (|100[) 


Qt(yl, v) L) = (v^)L t ) 2 +c(v^ y) L tt ) 2 (110) 


with the difference that the spatial Laplacian operator V 2 x , 
followed by squaring in (110 1 is here replaced by the spatial 
quasi quadrature operator Q< x , y )- 


These feature detectors can therefore be seen as biologi¬ 
cally inspired change detectors or as ways of measuring the 
combined strength of a set of receptive fields at any point, as 
possibly combined with variabilities over other parameters 
in the family of receptive fields. 


8.5 Scale normalized spatio-temporal derivative 
expressions 


For regular partial derivatives, normalization with respect to 
spatial and temporal scales of a spatio-temporal scale-space 
derivative of order m = (mi, m 2 ) over space and order n 
over time is performed according to 


= s (mi+m2) /2 a n (T) L x 


( 111 ) 


Scale normalization of the spatio-temporal differential ex¬ 
pressions in section [874] is then performed by replacing each 
spatio-temporal partial derivate by its corresponding scale- 
normalized expression (see ||63) for additional details). 

For example, for the three quasi quadrature entities in 
equations ( |107| i, ( |108| > and (109 1 , their corresponding scale- 
normalized expressions are of the form: 


Ql,(x,y,t),norm^ 


= s {Lx + Ly) + af(r) x 2 L 2 + 

+ C {s~(L 2 xx + 2 L 2 xy + L 2 y ) 

+s Q'i(t) x 2 (L 2 xt + L 2 t ) + a 2 (r) x 4 L 2 t ) , (112) 


S: 


2,(x,y,t),norm-* 


— Qt^normL X Q^x^jnorm-L 

= («1 (t) L 2 + C a 2 (t) L 2 t ) x 

( s {Lx + L 2 ) + Cs 2 (. L xx + 2 L 2 xy + L 2 y )) , 


(113) 


q 3 ,(x ,y ,t) ^norm-^ 

Q(x,y),norm^- J t H - C Q( x ,y),normJ- J tt 

— a l( T ) { s {L 2 xt + L ~ t ) + Cs 2 {L 2 xxt + 2 L 2 xyt + L 2 yt )) 

+ C a\ (r) (s {L xtt + L 2 tt ) 

+Cs 2 (L 2 xxtt + 2L xytt + L 2 ytt )) . (114) 


8.6 Experimental results 

Figure[7]shows the result of computing the above differential 
expressions for a video sequence of a paddler in a kayak. 

Comparing the spatio-temporal scale-space representa¬ 
tion L in the top middle figure to the original video / in 
the top left, we can first note that a substantial amount of 
fine scale spatio-temporal textures, e.g. waves of the water 
surface, is suppressed by the spatio-temporal smoothing op¬ 
eration. The illustrations of the spatio-temporal scale-space 
representation L in the top middle figure and its first- and 
second-order temporal derivatives L tin0 rm and L t t t norm in 
the left and middle figures in the second row do also show 
the spatio-temporal traces that are left by a moving object; 
see in particular the image structures below the raised pad¬ 
dle that respond to spatial points in the image domain where 
the paddle has been in the past. 
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Fig. 7 Spatio-temporal features computed from a video sequence in the UCF-101 dataset (Kayaking_g01_c01.avi, cropped) at spatial scale a x = 
2 pixels and temporal scale at = 0.2 seconds using the proposed separable spatio-temporal receptive field model with Gaussian filtering over the 
spatial domain and here a cascade of 7 recursive filters over the temporal domain with a logarithmic distribution of the intermediate scale levels for 
c = \/2 and with L p -normalization of both the spatial and temporal derivative operators. Each figure shows a snapshot around frames 90-97 for 
the spatial or spatio-temporal differential expression shown above the figure with in some cases additional monotone stretching of the magnitude 
values to simplify visual interpretation. (Image size: 258 x 172 pixels of original 320 x 240 pixels and 226 frames at 25 frames per second.) 
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The slight jagginess in the bright response that can be 
seen below the paddle in the response to the second-order 
temporal derivative L ttnorm is a temporal sampling artefact 
caused by sparse temporal sampling in the original video. 
With 25 frames per second there are 40 ms between adja¬ 
cent frames, during which there may happen a lot in the 
spatial image domain for rapidly moving objects. This sit¬ 
uation can be compared to mammalian vision where many 
receptive fields operate continuously over time scales in the 
range 20-100 ms. With 40 ms between adjacent frames it 
is not possible to simulate such continuous receptive fields 
smoothly over time, since such a frame rate corresponds to 
either zero, one or at best two images within the effective 
time span of the receptive field. To simulate rapid continuous 
time receptive fields more accurately in a digital implemen¬ 
tation, one should therefore preferably aim at acquiring the 
input video with a higher temporal frame rate. Such higher 
frame rates are indeed now becoming available, even in con¬ 
sumer cameras. Despite this limitation in the input data, we 
can observe that the proposed model is able to compute geo¬ 
metrically meaningful spatio-temporal image features from 
the raw video. 

The illustrations of <9 t (V? , L)and(9 f HV? , 

in the left and middle of the third row show the responses 
of our idealized model of non-lagged and lagged LGN cells 
complemented by a quasi-quadrature energy measure of these 
responses in the right column. These entities correspond to 
applying a spatial Laplacian operator to the first- and second- 
order temporal derivatives in the second row and it can be 
seen how this operation enhances spatial variations. These 
spatio-temporal entities can also be compared to the purely 
spatial interest operators, the Laplacian V? , L and 
the determinant of the Hessian det j),normL i n the first 
and second rows of the third column. Note how the gen¬ 
uinely spatio-temporal recursive fields enhance spatio-temporal 
structures compared to purely spatial operators and how static 
structures, such as the label in the lower right corner, disap¬ 
pear altogether by genuine spatio-temporal operators. The 
fourth row shows how three other genuine spatio-temporal 
operators, the spatio-temporal Hessian d t {X/ 2 x y ^ norm L), the 
rescaled Gaussian curvature G( x ,y,t),normL and the quasi 
quadrature measure Q t (det T~L^ XtV ) in ormL), also respond to 
points where there are simultaneously both strong spatial 
and strong temporal variations. 

The bottom row shows three idealized models defined to 
mimic qualitatively known properties of complex cells and 
expressed in terms of quasi quadrature measures of spatio- 
temporal scale-space derivatives. For the first quasi quadra¬ 
ture entity Qi t ( x ,y,t), normL to respond, in which time is 
treated in a largely qualitatively similar manner as space, it 
is sufficient if there are strong variations over either space or 
time. It can be seen that this measure is therefore not highly 
selective. For the second and the third entities Q-2,(x,y,t),normL 


and Q. 3 t ( x ,y,t),normL, it is necessary that there are simul¬ 
taneous variations over both space and time, and it can be 
seen how these entities are as a consequence more selective. 
For the third entity Q. 3 t ( x ,y,t),normL, simultaneous selectiv¬ 
ity over both space and time is additionally enforced on each 
primitive linear receptive field that is then combined into the 
non-linear quasi quadrature measure. We can see how this 
quasi quadrature entity also responds stronger to the mov¬ 
ing paddle than the two other quasi quadrature measures. 


8.7 Geometric covariance and invariance properties 

Rotations in image space. The spatial differential expres¬ 
sions |V( x ,j,)L|, V 2 xy) L, deLH^), k(L) and Q( x , y )L are 
all invariant under rotations in the image domain and so 
are the spatio-temporal derivative expressions <9t(Vjt T . y ^L), 
d tt (S 2 {x y) L), Q t (\7 2 xy) L), d t (detn {Xty) L), d tt (detn {x , y) L), 
Qt(det fi ( X; y)Z/), det fL( x ^ y ^L, G( x .y,i)L, 57 ( Xt y^L, Qi.( x ,y.t)L, 
Q 2 .(x,y,t)L and Q. 3 t ( x , y j)L as well as their corresponding 
scale-normalized expressions. 

) 

Uniform rescaling of the spatial domain. Under a uniform 
scaling transformation of image space, the spatial differen¬ 
tial invariants |V( Xj!/ )-L|, V^, y ^L, det 'H( x ,y) and k(L) are 
covariant under spatial scaling transformations in the sense 
that their magnitude values are multiplied by a power of the 
scaling factor, and so are their corresponding scale-normalized 
expressions. Also the spatio-temporal differential invariants 
dt{Vl Xty) L),d tt {\7l Xty) L),d t (detH (x , y) L),dtt{detH {Xty) L), 
det 'H( yXtVt t)L and G(x, Vt t)L and their corresponding scale- 
normalized expressions are covariant under spatial scaling 
transformations in the sense that their magnitude values are 
multiplied by a power of the scaling factor under such spa¬ 
tial scaling transformations. 

The quasi quadrature entity Q.( X} y), n ormL is however not 
covariant under spatial scaling transformations and not the 
spatio-temporal differential invariants Qt.normi V^. r yj L), 

Qt : norm(d-Qt fL^ x y ^Lf Ql,( x ,y,t), normL* Q2, (x,y,t),normL and 
Q3, (x,y.t),normL either. Due to the form of Q( x , y ),„ orm L, 

Qt.normi^{x.y)L\ Qt,norm{ ( ^^'^'( x .y)Lf 0-2,(x : y,t),normL 

and Q. 3 } ( x , y ,t),normL as being composed of sums of scale- 
normalized derivative expressions for 7 = 1 , these deriva¬ 
tive expressions can, however, anyway be made scale invari¬ 
ant when combined with a spatial scale selection mecha¬ 
nism. 

Uniform rescaling of the temporal domain independent of 
the spatial domain. Under an independent rescaling of the 
temporal dimension while keeping the spatial dimension fixed, 
the partial derivatives (xi, X 2 , f s, r) are covari¬ 

ant under such temporal rescaling transformations, and so 
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are the directional derivatives L^m 1 ± ip ™. 2t n for image ve¬ 
locity v = 0. For non-zero image velocities, the image ve¬ 
locity parameters of the receptive field would on the other 
hand need to be adapted to the local motion direction of the 
objects/spatio-temporal events of interest to enable match¬ 
ing between corresponding spatio-temporal directional deriva¬ 
tive operators. 

Under an independent rescaling of the temporal dimen¬ 
sion while keeping the spatial dimension fixed, also the spatio- 
temporal differential invariants d t (y 2 x ^L), 3t t (V? xy) I/), 
(det dft (det T), det <tnd G(x,y,t)L 

are covariant under independent rescaling of the temporal vs. 
spatial dimensions. The same applies to their corresponding 
scale-normalized expressions. 

The spatio-temporal differential invariants Qt.normC^f 


the spatio-temporal smoothing operator can be expressed as 

L x m x y rn 2 t n 

— d x m i i. 


Ts,r (log p(x, y, t) + log i(x, y, t) 
+ log Ccam (/(f)) + V(x,y) 


(115) 


where (i) p(x, y. t) is a spatially dependent albedo factor, 

(ii) i{x, y, t ) denotes a spatially dependent illumination field, 

(iii) C ca m{f{t)) = j j represents possibly time-varying in¬ 
ternal camera parameters and (iv) V(x,y) = — 21og(l + 
x 2 + y 2 ) represents a geometric natural vignetting effect. 

From the structure of equation (1151 we can note that for 
any non-zero order of spatial differentiation m\ + m 2 > 0, 
{he influence of the internal camera parameters in C' cam (/(f)) 


( X ’V) ^will disappear because of the spatial differentiation with re- 

Qi,norm (det ri^ X y^L), ^x,y,t),norm^- J -) Q2,(x,y,t),norm^- J 

Q 3 ,(x,y,t),norm,L are however not covariant under indepen¬ 


dent rescaling of the temporal vs. spatial dimensions and 
would therefore need a temporal scale selection mechanism 
to enable temporal scale invariance. 


8.8 Invariance to illumination variations and exposure 
control mechanisms 

Because of all these expressions being composed of spatial, 
temporal and spatio-temporal derivatives of non-zero order, 
it follows that all these differential expressions are invari¬ 
ant under additive illumination transformations of the form 

L \-+ L + C. 

This means that if we would take the image values / as 
representing the logarithm of the incoming energy / ~ log / 
or / ~ log / 7 = 7 log/, then all these differential ex¬ 
pressions will be invariant under local multiplicative illu¬ 
mination transformations of the form / 1 —> CI implying 
L ~ log/ + logC or L ~ log / 7 = 7 (log/ + logC). 
Thus, these differential expressions will be invariant to local 
multiplicative variabilities in the external illumination (with 
locality defined as over the support region of the spatio- 
temporal receptive field) or multiplicative exposure control 
parameters such as the aperture of the lens and the integra¬ 
tion time or the sensitivity of the sensor. 

More formally, let us assume a (i) perspective camera 
model extended with (ii) a thin circular lens for gathering in¬ 
coming light from different directions and (iii) a Lambertian 
illumination model extended with (iv) a spatially varying 
albedo factor for modelling the light that is reflected from 
surface patterns in the world. Then, by theoretical results 
in (Lindeberg |57 section 2.3]) a spatio-temporal receptive 
field response L x m Xy m 2t n.(-,-] s,r) where T s , t represents 


spect to X\ or X 2 , and so will the effects of any other multi¬ 
plicative exposure control mechanism. Furthermore, for any 
multiplicative illumination variation i'(x,y) = Ci{x,y ), 
where C is a scalar constant, the logarithmic luminosity will 
be transformed as log i'(x, y) = log C + log i(x, y), which 
implies that the dependency on C will disappear after spa¬ 
tial differentiation. For purely temporal derivative operators, 
that do not involve any order of spatial differentiation, such 
as the first- and second-order derivative operators, L t and 
L t t, strong responses may on the other hand be obtained due 
to illumination compensation mechanisms that vary over time 
as the results of rapid variations in the illumination. If one 
wants to design spatio-temporal feature detectors that are ro¬ 
bust to illumination variations and to variations in exposure 
compensation mechanisms caused by these, it is therefore 
essential to include non-zero orders of spatial differentia¬ 
tion. The use of Laplacian-like filtering in the first stages of 
visual processing in the retina and the LGN can therefore 
be interpreted as a highly suitable design to achieve robust¬ 
ness of illumination variations and adaptive variations in the 
diameter of the pupil caused by these, while still being ex¬ 
pressed in terms of rotationally symmetric linear receptive 
fields over the spatial domain. 

If we extend this model to the simplest form of position- 
and time-dependent illumination and/or exposure variations 
as modelled on the form 


L 1 — y L T Ax T By -f- Ct 


(116) 


<9 tt (defH( 
v (., y ,t) iand C 3 ,c 


then we can see that the spatio-temporal differential invari- 

antS dt( ^Uy) L ^ a «( V (*,v) L )’ ( V (*,y) L )’ ^( det U ^v) L \ 

Qt (d.Gt l~L(jx,y')]- J ), det l~L^ X y i)L, Q^ x ,y,t)I J 
x, y ,t)L are a ll invariant under such position- 
and time-dependent illumination and/or exposure variations. 

The quasi quadrature entities Q\y x ,y.t)L and Q,2 : (x,y.t)L 
are however not invariant to such position- and time-dependent 
illumination variations. This property can in particular be 
noted for the quasi quadrature entity Qiy x , y ,t)L, for which 
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Fig. 8 Illustration of the influence of temporal illumination or exposure compensation mechanisms on spatio-temporal receptive field responses, 
computed from the video sequence Kayaking_g01_c01.avi (cropped) in the UCF-101 dataset. Each figure shows a snapshot at frame 8 for the quasi 
quadrature entity shown above the figure with additional monotone stretching of the magnitude values to simplify visual interpretation. Note how 
the time varying illumination or exposure compensation leads to a strong overall response in the first quasi quadrature entity 
caused by strong responses in the purely temporal derivatives Lt and Ltt, whereas the responses of second and the third quasi quadrature entities 
Q 2 ,(x,y,t),normL and Q 3 ,( x ,y ,t) ,norm.L are m uch less influenced. Indeed, for a logarithmic brightness scale the third quasi quadrature entity 
Q3,(x,y,t),norm,L is invariant under such multiplicative illumination or exposure compensation variations. 


what seems as initial time-varying exposure compensation 
mechanisms in the camera lead to large responses in the ini¬ 
tial part of the video sequence (see figure [8] left)). Out of the 
three quasi quadrature entities Qi^ x ,y,t)L, 0.2, (x,y,t)L and 
Q.i.fx.y.t) L, the third quasi quadrature entity does therefore 
possess the best robustness properties to illumination varia¬ 
tions (see figure [8j right)). 

9 Summary and discussion 

We have presented an improved computational model for 
spatio-temporal receptive fields based on time-causal and 
time-recursive spatio-temporal scale-space representation de¬ 
fined from a set of first-order integrators or truncated expo¬ 
nential filters coupled in cascade over the temporal domain 
in combination with a Gaussian scale-space concept over the 
spatial domain. This model can be efficiently implemented 
in terms of recursive filters over time and we have shown 
how the continuous model can be transferred to a discrete 
implementation while retaining discrete scale-space proper¬ 
ties. Specifically, we have analysed how remaining design 
parameters within the theory, in terms of the number of first- 
order integrators coupled in cascade and a distribution pa¬ 
rameter of a logarithmic distribution, affect the temporal re¬ 
sponse dynamics in terms of temporal delays. 

Compared to other spatial and temporal scale-space rep¬ 
resentations based on continuous scale parameters, a con¬ 
ceptual difference with the temporal scale-space representa¬ 
tion underlying the proposed spatio-temporal receptive fields, 
is that the temporal scale levels have to be discrete. Thereby, 
we sacrifice a continuous scale parameter and full scale in¬ 
variance as resulting from the Gaussian scale-space con¬ 
cepts based on causality or non-enhancement of local ex¬ 
trema (Koenderink [38]; Lindeberg 1561 ) or used as a scale- 
space axiom in certain axiomatic scale-space formulations 


(Iijima (34); Florack et al. 11231 ; Pauwels et al. (771 ; Weick- 
ert et al. (94ll93ll95l : Duits et al. ifHlfBl : Fagerstrom [16] 
[17) '): see also Witkin l97l . Babaud et al. 0, Yuille and Pog- 
gio 1 1981 , Koenderink and van Doom (4011411 , Lindeberg H31 
I48ll49ll50ll5ni58) . Florack et al. [22ll23ll2P . Alvarez et al. 
(2) , Guichard (26) . ter Haar Romeny et al. 112811271 . Felsberg 
and Sommer lfl9l and Tschirsich and Kuijper (90) for other 
scale-space approaches closely related to this work, as well 
as Fleet and Langley (20) . Freeman and Adelson (25) . Si- 
moncelli et al. (89) and Perona (781 for more filter-oriented 
approaches, Miao and Rao (74) . Duits and Burgeth (13) . 
Cocci et al. 0, Barbieri et al. 0 and Sharma and Duits 
m for Lie group approaches for receptive fields and Lin¬ 
deberg and Friberg 1671681 for the application of closely re¬ 
lated principles for deriving idealized computational models 
of auditory receptive fields. 

When using a logarithmic distribution of the intermedi¬ 
ate scale levels, we have however shown that by a limit con¬ 
struction when the number of intermediate temporal scale 
levels tends to infinity, we can achieve true self-similarity 
and scale invariance over a discrete set of scaling factors. 
For a vision system intended to operate in real time using 
no other explicit storage of visual data from the past than 
a compact time-recursive buffer of spatio-temporal scale- 
space at different temporal scales, the loss of a continuous 
temporal scale parameter may however be less of a practical 
constraint, since one would anyway have to discretize the 
temporal scale levels in advance to be able to register the 
image data to be able to perform any computations at all. 

In the special case when all the time constants of the 
first-order integrators are equal, the resulting temporal smooth¬ 
ing kernels in the continuous model (29] > correspond to La- 
guerre functions (Laguerre polynomials multiplied by a trun¬ 
cated exponential kernel), which have been previously used 
for modelling the temporal response properties of neurons 
in the visual system (den Brinker and Roufs ©) and for 
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computing spatio-temporal image features in computer vi¬ 
sion (Rivero-Moreno and Bres ED; Berg et al. 0). Re¬ 
garding the corresponding discrete model with all time con¬ 
stants equal, the corresponding discrete temporal smoothing 
kernels approach Poisson kernels when the number of tem¬ 
poral smoothing steps increases while keeping the variance 
of the composed kernel fixed (Lindeberg and Fagerstrom 
li66l ). Such Poisson kernels have also been used for mod¬ 
elling biological vision (Fourtes and Hodgkin 1241 ). Com¬ 
pared to the special case with all time constants equal, a 
logarithmic distribution of the intermediate temporal scale 
levels p) does on the other hand allow for larger flexibil¬ 
ity in the trade-off between temporal smoothing and tempo¬ 
ral response characteristics, specifically enabling faster tem¬ 
poral responses (shorter temporal delays) and higher com¬ 
putational efficiency when computing multiple temporal or 
spatio-temporal receptive field responses involving coarser 
temporal scales. 

From the detailed analysis in section [5] and appendix |A| 
we can conclude that when the number of first-order integra¬ 
tors that are coupled in cascade increases while keeping the 
variance of the composed kernel fixed, the time-causal ker¬ 
nels obtained by composing truncated exponential kernels 
with equal time constants in cascade tend to a limit kernel 
with skewness and kurtosis measures zero, or equivalently 
third- and fourth-order cumulants equal to zero, whereas the 
time-causal kernels obtained by composing truncated expo¬ 
nential kernels having a logarithmic distribution of the in¬ 
termediate scale levels tends to a limit kernel with non-zero 
skewness and non-zero kurtosis This property reveals a fun¬ 
damental difference between the two classes of time-causal 
scale-space kernels based on either a logarithmic or a uni¬ 
form distribution of the intermediate temporal scale levels. 

In a complementary analysis in appendix [B] we have 
also shown how our time-causal kernels can be related to 
the temporal kernels in Koenderink’s scale-time model ll39l . 
By identifying the first- and second-order temporal moments 
of the two classes of kernels, we have derived closed-form 
expressions to relate the parameters between the two mod¬ 
els, and showed that although the two classes of kernels to 
a large extent share qualitatively similar properties, the two 
classes of kernels differ significantly in terms of their third- 
and fourth-order skewness and kurtosis measures. 

The closed-form expressions for Koenderink’s scale-time 
kernels are analytically simpler than the explicit expressions 
for our kernels, which will be sums of truncated exponential 
kernels for all the time constants with the coefficients deter¬ 
mined from a partial fraction expansion. In this respect, the 
derived mapping between the parameters of our and Koen¬ 
derink’s models can be used e.g. for estimation the time of 
the temporal maximum of our kernels, which would oth¬ 
erwise have to be determined numerically. Our kernels do 
on the other hand have a clear computational advantage in 


that they are truly time-recursive, meaning that the primi¬ 
tive first-order integrators in the model contain sufficient in¬ 
formation for updating the model to new states over time, 
whereas the kernels in Koenderink’s scale-time model ap¬ 
pear to require a complete memory of the past, since they do 
not have any known time-recursive formulation. 

Regarding the purely temporal scale-space concept used 
in our spatio-temporal model, we have notably replaced the 
assumption of a semi-group structure over temporal scales 
by a weaker Markov property, which however anyway guar¬ 
antees a necessary cascade property over temporal scales, 
to ensure gradual simplification of the temporal scale-space 
representation from any finer to any coarser temporal scale. 
By this relaxation of the requirement of a semi-group over 
temporal scales, we have specifically been able to define a 
temporal scale-space concept with much better temporal dy¬ 
namics than the time-causal semi-groups derived by Fager¬ 
strom m and Lindeberg l56l . Since this new time-causal 
temporal scale-space concept with a logarithmic distribution 
of the intermediate temporal scale levels would not be found 
if one would start from the assumption about a semi-group 
over temporal scales as a necessary requirement, we propose 
that that in the area of scale-space axiomatics the assumption 
of a semi-group over temporal scales should not be regarded 
as a necessary requirement for a time-causal temporal scale- 
space representation. 


Recently, and during the development of this article, Mah¬ 
moudi [Toll has presented a very closely related while more 
neurophysiologically motivated model for visual receptive 
fields, based on an electrical circuit model with spatial smooth¬ 
ing determined by local spatial connections over a spatial 
grid and temporal smoothing by first-order temporal inte¬ 
gration. The spatial component in that model is very closely 
related to our earlier discrete scale-space models over spatial 
and spatio-temporal grids (Lindeberg I45ll5lll54l ) as can be 
modelled by Z-transforms of the discrete convolution ker¬ 
nels and an algebra of spatial or spatio-temporal covariance 
matrices to model the transformation properties of the recep¬ 
tive fields under locally linearized geometric image transfor¬ 
mations. The temporal component in that model is in turn 
similar to our temporal smoothing model by first-order in¬ 
tegrators coupled in cascade as initially proposed in (Linde¬ 
berg li45l : Lindeberg and Fagerstrom l66l ). suggested as one 
of three models for temporal smoothing in spatio-temporal 
visual receptive fields in (Lindeberg H57U58I ) and then re¬ 
fined and further developed in (Lindeberg 16211631 ) and this 
article. Our model can also be implemented by electric cir¬ 
cuits, by combining the temporal electric model in figure [2] 
with the spatial discretization in section 6.3 or more general 
connectivities between adjacent layers to implement velocity- 
adapted receptive fields as can then be described by their re¬ 
sulting spatio-temporal covariance matrices. Mahmoudi com¬ 
pares such electrically modelled receptive fields to results of 
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neurophysiological recordings in the LGN and the primary 
visual cortex in a similar way as we compared our theoreti¬ 
cally derived receptive fields to biological receptive fields in 
(Lindeberg lf5Tll561!57ll62i ) and in this article. 

Mahmoudi shows that the resulting transfer function in 
the layered electric circuit model approaches a Gaussian when 
the number of layers tends to infinity. This result agrees 
with our earlier results that the discrete scale-space kernels 
over a discrete spatial grid approach the continuous Gaus¬ 
sian when the spatial scale increment tends to zero while the 
spatial scale level is held constant l45l and that the temporal 
smoothing function corresponding to a set of first-order in¬ 
tegrators with equal time constants coupled in cascade tend 
to the Poisson kernel (which in turn approaches the Gaus¬ 
sian kernel) when the temporal scale increment tends to zero 
while the temporal scale level is held constant l66l . 

In his article, Mahmoudi ED makes a distinction be¬ 
tween our scale-space approach, which is motivated by the 
mathematical structure of the environment in combination 
with a set of assumptions about the internal structure of a 
vision system to guarantee internal consistency between im¬ 
age representations at different spatial and temporal scales, 
and his model motivated by assumptions about neurophysi¬ 
ology. One way to reconcile these views is by following the 
evolutionary arguments proposed in (Lindeberg 15711591 ). If 
there is a strong evolutionary pressure on a living organism 
that uses vision as a key source of information about its en¬ 
vironment (as there should be for many higher mammals), 
then in the competition between two species or two individ¬ 
uals from the same species, there should be a strong evolu¬ 
tionary advantage for an organism that as much as possible 
adapts the structure of its vision system to be consistent with 
the structural and transformation properties of its environ¬ 
ment. Hence, there could be an evolutionary pressure for the 
vision system of such an organism to develop similar tupes 
of receptive fields as can be derived by an idealized mathe¬ 
matical theory, and specifically develop neurophysiological 
wetware that permits the computation of sufficiently good 
approximations to idealized receptive fields as derived from 
mathematical and physical principles. From such a view¬ 
point, it is highly interesting to see that the neurophysiolog¬ 
ical cell recordings in the LGN and the primary visual cor¬ 
tex presented by DeAngelis et al. 1112111 II are in very good 
qualitative agreement with the predictions generated by our 
mathematically and physically motivated normative theory 
(see figure [3] and figure |4|. 

Given the derived time-causal and time-recursive formu¬ 
lation of our basic linear spatio-temporal receptive fields, 
we have described how this theory can be used for com¬ 
puting different types of both linear and non-linear scale- 
normalized spatio-temporal features. Specifically, we have 
emphasized how scale normalization by L p -normalization 
leads to fundamentally different results compared to more 


traditional variance-based normalization. By the formula¬ 
tion of the corresponding scale normalization factors for dis¬ 
crete temporal scale space, we have also shown how they 
permit the formulation of an operational criterion to estimate 
how many intermediate temporal scale levels are needed to 
approximate true scale invariance up to a given tolerance. 

Finally, we have shown how different types of spatio- 
temporal features can defined in terms of spatio-temporal 
differential invariants built from spatio-temporal receptive 
field responses, including their transformation properties un¬ 
der natural image transformations, with emphasis on inde¬ 
pendent scaling transformations over space vs. time, rota¬ 
tional invariance over the spatial domain and illumination 
and exposure control variations. We propose that the pre¬ 
sented theory can be used for computing features for generic 
purposes in computer vision and for computational mod¬ 
elling of biological vision for image data over a time-causal 
spatio-temporal domain, in an analogous way as the Gaus¬ 
sian scale-space concept constitutes a canonical model for 
processing image data over a purely spatial domain. 
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A Frequency analysis of the time-causal kernels 

In this appendix, we will perform an in-depth analysis of the proposed 
time-causal scale-space kernels with regard to their frequency proper¬ 
ties and moment descriptors derived via the Fourier transform, both 
for the case of a logarithmic distribution of the intermediate tempo¬ 
ral scale levels and a uniform distribution of the intermediate temporal 
scale levels. Specifically, the results to be derived will provide a way to 
characterize properties of the limit kernel when the number of temporal 
scale levels K tends to infinity. 


A.l Logarithmic distribution of the intermediate scale 
levels 

In section[5] we gave the following explicit expressions for the Fourier 
transform of the time-causal kernels based on a logarithmic distribution 
of the intermediate scale levels 


h exp (lj\ t,c,K) 


1 

1 + i c 1— K \Jt ui 


n 


i 

1 + i c k ~ K ~ 1 Vc 2 — 1-y/roj 


(117) 
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for which the magnitude and the phase are given by 

T, C,-K")| 

1 K 1 

=_1_TT_I_, 

^l + c2(t —if) TCJ 2 ^/l+ c 2(fc-Jf-l)(c 2 - l)Tid 2 

(118) 

a,igh exp (uj-, t, c, K) 

K 

= arctan (c 1 ~ K y/ru j) + arctan ^c fe_Jf_ 1 -\/c 2 — 1 ^/ruj . 

fc=2 

(119) 

Let us rewrite the magnitude of the Fourier transform on exponential 
form 


| h exp {u; t,c,K)\ = t,c,k)\ 

= e -| log(l+c a t 1 -- K 'W)-l log(l+o a l l - K - 1 Hc 2 -l)r W ! ) 

( 120 ) 


and compute the Taylor expansion of 

log I heap (w; r, c, if)[ = C2UJ 2 + C4UJ 4 + G>(aj 6 ) 

where 

t 2 (— 2 c 4_4X — c 2 + 1 ) (c 2 — 1 ) r 2 

4 “ 4 (c 2 + 1) ^ 4(c 2 + 1) ’ 


( 121 ) 


( 122 ) 

(123) 


and can read the cumulants of the underlying temporal scale-space ker¬ 
nel as kq = 0, ki = —Ci, K2 = — 2 C 2 , k 3 = 6 C 3 and K 4 = 24C 4 . 
Specifically, the first-order moment Mi and the higher-order central 
moments M 2 , M 3 and M 4 are related to the cumulants according to 


Mi — k 1 — —Ci 




c — 1 


M 2 = K2 = — 2C2 = T, 

2 (c+l) v / ?Mr 3 / 2 
(c 2 + c + 1 ) 


M 3 = K 3 = 6 C 3 


IV /4 — /r 4 -j- 3 k| — 24 C 4 “t - 12C 2 


3 (3c 2 — 1) t 2 
c 2 + 1 


(129) 

(130) 

(131) 

(132) 


Thus, the skewness 71 and the kurtosis 72 measures of the correspond¬ 
ing temporal scale-space kernels are given by 


71 


72 


M 3 

Ml' 2 


K 4 M 4 
k 2 M 2 


3C 3 2( c + 1)V^^1 

V 2 (-Ci ) 3 / 2 ^ (c 2 +c+ 1) ’ 

(133) 


C 4 6 (c 2 — 1) 
C 2 c 2 + 1 


(134) 


Figure [9] shows graphs these skewness and kurtosis measures as func¬ 
tion of the distribution parameter c for the limit case when the number 
of scale levels K tends to infinity. As can be seen, both the skewness 
and the kurtosis measures of the temporal scale-space kernels increase 
with increasing values of the distribution parameter c. 


and the rightmost expression for C 4 shows the limit value when the 
number K of first-order integrators coupled in cascade tends to infinity. 
Let us next compute the Taylor expansion of 


arg h exp (aj; r, c, K) = CTaj + C 3 iu 3 + 0(aj 5 ) 
where the coefficients are given by 

\frc~ K (—c 2 + \/c 2 — lc — yV 2 1 c K + cj 


Ci = 


c — 1 


Cs = 


c — 1 

V^Tr 3 / 2 ,, 3K 


((c 3K + c 3K + 1 - c 4 - c 3 ) c~ 3K 


3 (c 2 + c + 1) 

+ (c 5 +c 4 + c 3 ) 

(c + l)Vc 2 — 1 T 3 / 2 
3 (c 2 + c + 1) 


(124) 


(125) 


(126) 


and again the rightmost expressions for Ci and C 3 show the limit val¬ 
ues when the number K of scale levels tends to infinity. 

Following the definition of cumulants K n defined as the Taylor 
coefficients of the logarithm of the Fourier transform 


log h(u) 


£' 


(-** 0 " 


(127) 


we obtain 


log 

hexp (u-, T, c, K) 

— —Ci(—raj) — C 2 (—raj) 2 T C 3 (—raj) 3 -|- C 4 (—ra;) 4 -f- C(raJ 3 ) 

— K ° + + TTf (““■‘■0 3 + + C*(raj 5 ) 

(128) 


skewness 71 (c) 



kurtosis 72 (c) 



Fig. 9 Graphs of the skewness measure 71 1 133 1 and the kurtosis mea¬ 
sure 72 C3 as function of the distribution parameter c for the time- 
causal scale-space kernels corresponding to limit case of K truncated 
exponential kernels having a logarithmic distribution of the intermedi¬ 
ate scale levels coupled in cascade in the limit case when the number 
of scale levels K tends to infinity. 
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A.2 Uniform distribution of the intermediate scale levels 

When using a uniform distribution of the intermediate scale levels 0 . 
the time constants of the individual first-order integrators are given by 
0 and the explicit expression for the Fourier transform 0 is 

h exp (uj; t, K) = ----(135) 

Specifically, the magnitude and the phase of the Fourier transform are 


given by 



|^ea;p(^5 T, iT)| — 

1 

(136) 


arg h exp (tu; r,I<) = 

— K arctan ( . / — u ) . 

IV K J 

(137) 


Let us rewrite the magnitude of the Fourier transform on exponential 
form 


I h exp (u-, T, K)\ = e lo Blhexp(v; T,if)| =e -f log(l+£^ 2 ) 



(138) 

and compute the Taylor expansion of 


log | h exp {u- t,K) 1 = C 2 lj 2 + C 40 J 4 + 0{u e ) 

(139) 

where 



(140) 

C 4 = —. 

4i\ 

(141) 


Next, let us compute the Taylor expansion of 


and kurtosis measures tend to zero for the temporal scale-space ker¬ 
nels having a uniform distribution of the intermediate temporal scale 
levels. The corresponding skewness and kurtosis measures ( | 133| l and 
{134} for the kernels having a logarithmic distribution of the interme¬ 
diate temporal scale levels do on the other hand remain strictly posi¬ 
tive. These properties reveal a fundamental difference between the two 
classes of time-causal kernels obtained by distributing the intermediate 
scale levels of first-order integrators coupled in cascade according to a 
logarithmic vs. a uniform distribution. 


B Comparison with Koenderink’s scale-time model 


In his scale-time model, Koenderink 1391 proposed to perform a loga¬ 
rithmic mapping of the past via a time delay S and then applying Gaus¬ 
sian smoothing on the transformed domain, leading to a time-causal 
kernel of the form, here largely following the notation in Florack ED 
result 4.6, page 116] 

1 . 1os2 (xL|) 

hi og (t; a, 5, a) = - -e ^ (151) 

v 47 t<t(o — a) 

with a denoting the present moment, <5 denoting the time delay and cr 
is a dimensionless temporal scale parameter relative to the logarithmic 
time axis. For simplicity, we will henceforth assume a = 0 leading to 
kernels of the form 


hlog (f i tf, 



(152) 


and with convolution reversal of the time axis such that causality im¬ 
plies hi og (t-, <r, 8 ) = 0 for t < 0. By integrating this kernel symboli¬ 
cally in Mathematica, we find 



a, S) dt = e 2 


(153) 


arg h exp (ui', r,K) = Ciuj + C 30 J 3 + 0(ui 5 ) 
where the coefficients are given by 
Ci = -Vk^, 

C 3 = 


r 3/2 


3 Vk' 


(142) 

(143) 

(144) 


Following the definition of cumulants re n according to we can 

in an analogous way to ( | 1 28) i in previous section read ko = 0 , ki = 
—Ci, k 2 = — 2 C 2 , «3 = 6 C 3 and K 4 = 24C4, and relate the first- 
order moment Mi and the higher-order central moments M 2 , M 3 and 
M 4 to the cumulants according to 

Mi = ki = -Ci = s/Kt, 

M 2 = K2 = — 2C2 = T, 

2t 3/2 


M 3 = K 3 = 6 C 3 = 


Vk ’ 


M 4 = K 4 + 3«2 = 24C 4 + 12Cf = 3 T 2 + 


6r2 

K 


Thus, the skewness 71 and the kurtosis 72 of the corresponding tem¬ 
poral scale-space kernels are given by 


implying that the corresponding time-causal kernel normalized to unit 
Li-norm should be 

1 _ 1os2 (j) _<A 

h Ko e{t', <r,5)= e 2 . (154) 

V Zircr o 

The temporal mean of this kernel is 

f°° 3 cr 2 

Mi =t= t hj^oe (f 5 cr? S) dt = 5 e 2 

J t = — oo 

and the higher-order central moments 


M 2 = 


roc* 

/ (t-t) 2 h Koe (t-,a,5)dt 

J t = — OO 


(155) 


(145) 

= ^ 2 (e" 2 -l), 


(156) 

(146) 

r 00 

M 3 = / (t - t ) 3 h KoB (t\ 

cr, 5) dt 


(147) 

J t = — OO 

+ 2 ), 

(157) 

(148) 
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M 4 = / (t-t) 4 h Koe (t-, 

J t= — OO 

cr, 5) dt 

(158) 


: 5 4 e 6 CT s £ 3e 2a* + 2e 3*° + e 4^ _ 3 ) 


(159) 


71 


72 


«3 _ M 3 _ 2 

K 4 _ M 4 _ 6 ^ 

k 2 - M| ~ K' 


(149) 

(150) 


From these expressions we can note that when the number I< of first- 
order integrators that are coupled in cascade increases, these skewness 


Thus, the skewness 71 and the kurtosis 72 of the temporal kernels in 
Koenderink’s scale-time model are given by (see figure[T 0 ]for graphs) 


71 


72 


——^ — 3 = 3e 2CT2 + 2e 3CT2 + e 4a2 - 6 . 
Mf 


(160) 

(161) 
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h(t; K = 7, c = %/2) h t {t\ I< = 7,c = \/2)) h tt (t; K = 7, c = V%)) 





Fig. 11 Comparison between the proposed time-causal kernels corresponding to the composition of truncated exponential kernels in cascade (blue 
curves) for a logarithmic distribution of the intermediate scale levels and the temporal kernels in Koenderink's scale-time model (brown curves) 
shown for both the original smoothing kernels and their first- and second-order temporal derivatives. All kernels correspond to temporal scale 
(variance) r = 1 with the additional parameters determined such that the temporal mean values (the first-order temporal moments) become equal 
in the limit case when the number of temporal scale levels K tends to infinity (equation < | 1 64[ l). (top row) Logarithmic distribution of the temporal 
scale levels for c = %/2 and K = 10. (middle row) Corresponding results for c = 2 3/ ' 4 and K = 10. (bottom row) Corresponding results for 
c = 2 and K = 10. 


If we want to relate these kernels in Koenderink’s scale-time model 
to our time-causal scale-space kernels, a natural starting point is to 
require that the total amount of temporal smoothing as measured by 
the variances M 2 of the two kernels should be equal. Then, this implies 
the relation 

r = 5 2 e 3<T2 - l) . (162) 

If we additionally relate the kernels by enforcing the temporal delays 
as measured by the first-order temporal moments to be equal, then we 
obtain for the limit case when K — > 00 



Solving the system of equations ( fl62l i and fl63| > then gives the follow¬ 
ing mappings between the parameters in the two temporal scale-space 
models 


{ 


T 


<5 2 e 3cr2 


c = 





(164) 


which hold as long as c > 1 and a < i^/log 2 R 5 0.832. Specifically, 
for small values of a a series expansion of the relations to the left gives 


f t = 5V 2 + + + + o(* s )) , 

\ c = 1 + 2a 2 + 3cr 4 + + 0(o- 8 ). 


(165) 


If we additionally reparameterize the distribution parameter c such that 
c = 2 “ for some a > 0 and perform a series expansion, we obtain 


o ' 2 — log (2 — e<T2 ) _ 2 cr 2 
log( 2 ) ~ log "2 



13(7® 

24 


+ 0(o- 8 ) 


(166) 


and with b = a log 2 to simplify the following expressions 


(167) 


These expressions relate the parameters in the two temporal scale- 
space models in the limit case when the number of temporal scale 
levels tends to infinity for the time-causal model based on first-order 
integrators coupled in cascade and with a logarithmic distribution of 
the intermediate temporal scale levels. 

For a general finite number of K, the corresponding relation to 
{763} that identifies the first-order temporal moments does instead read 


c K (c 2 — (\/c 2 — 1 + l) c + \/c 2 — 1 C K ) 3 ^ 

t = -^^^- - Vr = <5e 2 . 


1 


(168) 


Solving the system of equations { 162 } and {168} then gives 
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skewness 71 (a) 




Fig. 10 Graphs of the skewness measure 71 160 and the kurtosis 
measure 72 {T6T} as function of the dimensionless temporal scale pa¬ 
rameter a relative to the logarithmic transformation of the past for the 
time-causal kernels in Koenderink's scale-time model. 


where 

A = 2c (c 4K - 4c k + 2 - 4 c K + 3 + 3 c 2K + 3 
-3 c 3K+2 + c 4K+1 + 2c 3 

+ (y/c^l - 1 ) c 3 * - (v^l - 4 ) c 2K+1 

+ (y/#=l + 5 ) c 2if + 2 - (V^l + 4) c 3K + 1 ) , 

(170) 

B = (c 2K - 2 c K+1 - 2 c K+2 + c 2K+1 + 2c 2 ) 2 , (171) 

C = (c 2 - (Vc 2 - 1 + l) c + Vc 2 - lc K ) , (172) 

D = c (c 4K - 4 c k + 2 - 4c k + 3 + 3c 2K + 3 - 3c 3K + 2 + c 4K+1 + 2c 3 

+ (v^i - l) c 3X - - 4 ) c 2K + 3 

+ (y^i + 5 ) c 2if + 2 - (v^T + 4 ) c 3K + 3 ) , 

(173) 

E = [c 2K - 2 c K+1 - 2 c K+2 + c 2K+1 + 2c 2 ) 2 . (174) 


Unfortunately, it is harder to derive a closed-form expression for c as 
function of a for a general (non-infinite) value of K. 

Figure [TT] shows examples of kernels from the two families gen¬ 
erated for this mapping between the parameters in the two families of 
temporal smoothing kernels for the limit case (164) when the num¬ 
ber of temporal scale levels tends to infinity. As can be seen from the 
graphs, the kernels from the two families do to a first approximation 
share qualitatively largely similar properties. From a more detailed in¬ 
spection, we can, however, note that the two families of kernels differ 
more in their temporal derivative responses in that: (i) the temporal 
derivative responses are lower and temporally more spread out (less 
peaky) in the time-causal scale-space model based on first-order inte¬ 
grators coupled in cascade compared to Koenderink’s scale-time model 


and (ii) the temporal derivative responses are somewhat faster in the 
temporal scale-space model based on first-order integrators coupled in 
cascade. 

A side effect of this analysis is that if we take the liberty of ap¬ 
proximating the limit case of the time-causal kernels corresponding to 
a logarithmic distribution of the intermediate scale levels by the kernels 
in Koenderink’s scale-time model with the parameters determined such 
that the first- and second-order temporal moments are equal, then we 
obtain the following approximate expression for the temporal location 
of the maximum point of the limit kernel 


(C+1) 2 y/F 

2x/2^(c-l)c3 


(175) 


From the discussion above, it follows that this estimate can be expected 
to be an overestimate of the temporal location of the maximum point 
of our time-causal kernels. This overestimate will, however, be better 
than the previously mentioned overestimate in terms of the temporal 
mean. For finite values of K not corresponding to the limit case, we 
can for higher accuracy alternatively estimate the position of the local 
maximum from 5 in {T69}. 


skewness 71 (c) 



Fig. 12 Comparison between the skewness and the kurtosis measures 
for the time-causal kernels corresponding to the limit case of K first- 
order integrators coupled in cascade when the number of temporal 
scale levels K tends to infinity (blue curves) and the corresponding 
temporal kernels in Koenderink’s scale-time model (brown curves) 
with the parameter values determined such that the first- and second- 
order temporal moments are equal (equation (164) ). 


Figure [12] shows an additional quantification of the differences be¬ 
tween these two classes of temporal smoothing kernels by showing 
how the skewness and the kurtosis measures vary as function of the 
distribution parameter c for the same mapping (164) between the pa¬ 
rameters in the two families of temporal smoothing kernels. As can 
be seen from the graphs, both the skewness and the kurtosis measures 
are higher for the kernels in Koenderink’s scale-time model compared 
to our time-causal kernels corresponding to first-order integrators cou¬ 
pled in cascade and do in these respect correspond to a larger deviation 
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from a Gaussian behaviour over the temporal domain. (Recall that for a 
purely Gaussian temporal model all the cumulants of higher order than 
two are zero, including the skewness and the kurtosis measures.) 


C Scale invariance and covariance of scale-normalized 
temporal derivatives based on the limit kernel 

In this appendix we will show that in the special case when the tem¬ 
poral scale-space concept is given by convolution with the limit kernel 
according to {39} and {38}, the corresponding scale-normalized deriva¬ 
tives by either variance-based normalization m or L p -normalization 
{77} are perfectly scale invariant for temporal scaling transformations 
with temporal scaling factors S that are integer powers of the distri¬ 
bution parameter c. As a pre-requisite for this result, we start by de¬ 
riving the transformation property of scale-normalized derivatives by 
L p -normalization {77} under temporal scaling transformations. 


C.l Transformation property of L p -norms of 
scale-normalized temporal derivative kernels under 
temporal scaling transformations 

By differentiating the transformation property {44} of the limit kernel 
under scaling transformations for S = c - 7 

!L(f; r, c) = c 7 | L(c - 7 1\ c 2 jr,c) (176) 

we obtain 


(t; r, c) = e 7 (o’ t; c 2 jT,c) — c?( n + 1 ^'I't n -(c?t', c 2 jr,c). 

(177) 


The Lp-norm of the n:th-order derivative of the limit kernel at temporal 
scale t = c 2 - 7 

roo 

c 2 j,c)\\P = / c 2 j,c)\ p (178) 

Ju=0 

can then by the change of variables u = c J z with du = e 7 dz and using 
the transformation property {177} be transformed to the Lp-norm at 
temporal scale r = 1 according to 

II c 2 j,c)\\P= (j | c -j(n+l),p tn ( z; 1 , e) | p dz'j c> 

= c -i(n4-l)p+i||^ B (. ; 1 , C )||P (179) 

thus implying the following transformation property over scale 
IhM'l c 2 j,c)\\ p = c-i( n +V+i/'’Mrn(-, l,c)||p. (180) 


Thereby, the scale normalization factors for temporal derivatives in 
equation {78} 


On, 7 (c 2 ' 7 ) 


Gn .T _ c j(n+\)-i/p G ™.7 

c 2 4)IIp ||l,c)||p 

Cp(n+i)—j/p (181) 


evolve in a similar way over t emp oral scales as the scaling factors of 
variance-based normalization \16\ for r = c 2j 

T n 7 /2 _ c jn-y (!82) 


if and only if 

_ 1 

P 1 + n(l — 7 ) 


(183) 


C.2 Transformation property of scale-normalized temporal 
derivatives under temporal scaling transformations 


Consider two signals / and f that are related by a temporal scaling 
transform f(t') = /(t) for t' = d>'~H according to {ii) 

r',c) = L{t- t,c) (184) 

between cor respo nding temporal scale levels t' = e 2 *- 7 ~^r. By dif¬ 
ferentiating 4 1 84} and with dt = c 7 ~^d t / we obtain 

c nU : '-J)L t ,„(t'; r',c) = L t n (t; r, c). (185) 

Specifically, for any temporal scales r' = c 2 - 7 and r = c 2j we have 

c n i'L t / n (t'\ c 2 i',c) = c^L t n.(t- c 2j ,c). (186) 


This implies that for the temporal scale-space concept defined by con¬ 
volution with the limit kernel, scale-normalized derivatives computed 
with scale normalization factors defined by either L p -normalization 
\181\ for p = 1 or variance-based normalization \182\ for 7 = 1 will 
be equal 

t', c) = L ( n(t; t,c) (187) 

between matching scale levels under temporal scaling transformations 
with temporal scaling factors S = c 7 that are integer powers of the 
distribution parameter c. 

More generally, for Lp-normalization for any value of p with a 
corresponding 7 -value according to {183} it holds that 
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= L C n(t; T,c) = {eq. (183) } 

— c (j — i)(l — l/p) L^n(t] t,c) (188) 

In the proof above, we have for the purpose of calculations related the 
evolution properties over scale relative to the temporal scale r = 1 
and normalized the relative strengths between temporal derivatives of 
different order to the corresponding strengths G„, 7 of Lp-norms of 
Gaussian derivates. These assumptions are however not essential for 
the scaling properties and corresponding scaling transformations can 
be derived relative to any other temporal base level to as well as for 
other ways of normalizing the relative strengths of scale-normalized 
derivatives between different orders n and distribution parameters c. 
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